








Vol. XXI OCTOBER, 1937 


THE JOURNAL OF 
APPLIED PSYCHOLOGY 


Edited by 
James P. Porter, Ohio University 
Athens, Ohio 





TABLE OF CONTENTS 


JOHN G. DaruteEy. Scholastic Achievement and Measured 


NorMAN J. Powe.u. Item Evaluation in a Civil Service 
Examination 


Frank R. Exuiorr. Memory Effects from Poster, Radio 
and Television Modes of Advertising an Exhibit 


AuBErt J. Harris. The Relative Significance of Measures 
of Mechanical Aptitude, Intelligence, and Previous 
Scholarship for Predicting Achievement in Dental 


Z. C. Dickinson. Validity and Independent Criteria in 
Tests and Ratings 


Doveias Fryer. Variability in Automatic Mental Per- 
formance with Uniform Intent 


Oran W. Eacueston. The Success of Sixty Subjects in 
Attempting to Recognize Their Handwriting 


C. R. Arwen anp F. L. Wetis. Wide Range Multiple 
Choice Vocabulary Tests 








ii : CONTENTS 


GrirritH W. WiLLiAMs AND JANET Lines. An Evaluation 
of the Ferguson Form Boards and the Derivation of 


New Age and Grade Norms. Part Do nccoccccccccccccsmnnennene 556 
BEATRICE CANDEE AND Minton Buum. Report of a Study 
eee eee ees Pe a. 572 


Rosert B. SELOVER AND JAMES P. Porter. Prediction of 
the Scholarship of Freshman Men by Tests of Listen- 








ing and Learning Ability ...... siseindeintatebeiic 583 
PsycHoLocicaL Corporation. A Study of Public Rela- 

tions and Social Attitudes 2 ceccccscccesnneeenceennnnseonennnnenene 589 
CSIR a RE DMT ARAL le DT 603 
BES SIT RAR See Mee ne «ORNATE TS aE 606 











SCHOLASTIC ACHIEVEMENT AND 
MEASURED MALADJUSTMENT* 


JOHN G. DARLEY 
The General College, University of Minnesota 


N the clinical practice of student personnel work, some un- 
determined part of student mortality is attributed to 
extra-educational maladjustments that prevent students 

from using their unit abilities satisfactorily. While individ- 
ual cases may be found wherein health, financial, social or emo- 
tional problems are judged to be directly responsible for aca- 
demic deficiencies, little statistical evidence is available to 
demonstrate this relationship satisfactorily. A Utopian ideal 
for education might lie in the following principle: no student 
should. be dropped from an educational institution for failing 
work until it can be proved conclusively that such failing work 
is directly due to inappropriate ability or to willful lack of 
effort. As a corollary to this, a further tenet would be: edu- 
cation must build its curricular offerings in such a way that 
some consonance exists between the activities in which the 
student must exercise his abilities and the broad life activities 
for which he is being prepared. In the absence of these two 
viewpoints, student personnel work partly concerns itself on 
the margin of education with preventive or curative work to 
the end that the students may arrive in the usual class-room 
situation in the optimal condition for the learning process. 
But to provide research verification, no matter how limited, 
of the clinical situations in which extra-educational malad- 


* These data are taken from ‘‘ An Analysis of Attitude and Adjustment 
Tests With Special Reference to Conditions of Change in Attitudes and 
Adjustments,’’ by John G. Darley, a Ph.D. thesis on file at the University 
of Minnesota Library. Professor D. G. Paterson was the author’s major 
advisor. 

485 











486 JOHN G. DARLEY 


justments lead to academic deficiencies may serve as an im- 
petus to further research along these lines. 

In an analogous situation, one finds a lack of clear-cut evi- 
dence for any relation between measured occupational inter- 
ests, and academic achievement. The Strong Vocational In- 
terest Tests, standardized on criteria of occupational success, 
do not correlate highly with subject-matter achievement. Ad- 
mittedly, lack of agreement between criteria of occupational 
success and criteria of academic success partially accounts for 
the limited relation between class-room achievement and 
measured occupational interest. If the relation were high, on 
the other hand, one might question the unitary nature of the 
two areas of behavior being measured. The clinical resolution 
of the difficulty lies in accepting as separate and valid data 
the extensive research on interest measurement, including sta- 
bility of interests, and the equally extensive research on the 
relation of general and specific abilities to achievement, and 
so applying these research data to the individual case that total 
plans strike a balance that gives due weighting to both factors. 

In attitude and adjustment scales, one additional premise 
must be tested. It is generally accepted that present-day in- 
terest tests measure something relevant to total occupational 
and educational adjustment. But is it to be assumed that atti- 
tudes and adjustments as measured are relevant to life adjust- 
ment problems, and further are the attitudes and adjustments 
as measured the same ones that the clinician isolates as condi- 
tions underlying academic deficiencies? Internal evidence 
would answer the first part of this question affirmatively ; the 
apparent disparate nature of psychiatric or clinical, and mea- 
surement or test, definitions of maladjustment confuse the 
answer to the latter part of the question. Rundquist and 
Sletto,? Bell,? and Williamson and Darley* all cite <vidence 

1 Rundquist, E. A. & Sletto, R. F.: Scoring Instructions for the Minne- 
sota Scale for the Survey of Opinions, Univ. of Minnesota Press, 1936. 

2 Bell, H. M.: Manual for the Adjustment Inventory, Stanford Univ. 
Press, 1934. 

3 Williamson, E. G. & Darley, J. G.: Manual for the Minnesota Inven- 


tories of Social Attitudes, The Psychological Corporation, New York City, 
1937. 














SCHOLASTIC ACHIEVEMENT 487 


showing a lack of relation between their respective tests and 
either ability or achievement in various school populations. 

Within the General College, in which the present study was 
conducted, one further experimental difficulty exists. It has 
been the policy of the College to lay down no arbitrary stand- 
ards of class-room achievement by which failing or low-grade 
students may be identified. Thus one cannot study a group of 
deficiency students defined in terms of a stable reference point 
of achievement. These two limitations—are the tests relevant, 
and is the group homogeneous in respect to the basic character- 
istic of academic deficiency—are not insurmountable. 

During the academic year 1935-36, 326 men and 217 women 
in the General College were tested and re-tested over an aver- 
age interval of 9.2 months, on the following attitude and ad- 
justment scales: the Minnesota Scale for the Survey of Opin- 
ions,’ the Adjustment Inventory,? and the Minnesota Inven- 
tories of Social Attitudes.* These schedules yield twelve atti- 
tude or adjustment scores. The Minnesota Scale for the Sur- 
vey of Opinions is scored for: morale; inferiority feelings; 
family adjustments; attitudes toward the law; economic con- 
servatism ; attitudes toward education. The Adjustment In- 
ventory is scored for home, health, social, and emotional ad- 
justments. The Minnesota Inventories of Social Attitudes 
yield separate scores for social preference and social behavior. 

These tests covered about one and one half hours of a four 
to five hour basic testing program for clinical counseling ser- 
vices in the college. Ability, aptitude, interest, and achieve- 
ment tests were also given in the first testing program. 

Table I summarizes the intercorrelations between two mea- 
sures of academic ability and each of the twelve attitude and 
adjustment scales. In only one instance do these intercorrela- 
tions vary significantly from a correlation of 0.00, and that is 
between the family scale of the Minnesota Survey of: Opinions 
and the College Aptitude Test. Since this relation does not 
recur in the case of the American Council Psychological Ex- 
amination, it is safe to assume that ability, as measured, and 

















488 


JOHN G. DARLEY 


TABLE I 





Correlation Between Scale Scores and Ability Measures for 291 Men 


and 199 Women 


























8.E.p=0 for 291 men = .06 8.E.p=0 for 199 women =.07 
AMERICAN COUNCIL MINNESOTA COLLEGE 
PS LOG 
EXAMINATION APTITUDE TEST 
Men Women Men Women 
RUNDQUIST-SLETTO: 
RRR RS SE Oe — .030 — .005 .046 125 
Inferiority — .029 .031 — .046 .003 
a 125 117 .204 -200 
Ae SE Se — .018 .096 .070 118 
Economic Conservatism .032 .007 .100 — .013 
pe ee .033 — .026 .038 .095 
BELL: 
Home .060 .076 .076 170 
Health 124 .004 .081 159 
Social .013 — .039 — .003 — .080 
Bormotiomal nnccccccccccccccscscccesen .088 — .006 — .004 .005 
Soctat PREFERENCES ........... — 095 — 045 — .021 — 015 
SOCIAL BEHAVIOR occ. - .018 — .040 — .005 .018 

















attitudes and adjustments, as measured, are probably unre- 
lated in these data. 


TABLE II 


Means, Standard Deviations and Correlations Between Grades and Ameri- 
can Council Psychological Examination for Fall, Winter 


and Spring Quarters 






































GRADE AVERAGE Sa CORRELATION 
IN SIGMA FR po = BETWEEN 
QUARTER N UNITS ectalenaeeeer GRaDes xD 
COUNCIL 
Mean | Sigma| Mean | Sigma 
MEN: 
TRE idtanelesiccs 330 50.44 7.37 147.00 42.93 536 + .05* 
Winter ......... 316 50.90 7.48 149.43 42.82 .364 + .06 
Spring ........ 280 50.87 7.55 150.03 42.50 .344 + .06 
WoMEN 
Fall .............. | 2810 50.52 8.25 152.19 43.62 435 + .07 
Winter ........ | 194 51.14 7.15 154.07 43.14 427 + .07 
Spring ...... | 176 50.95 7.52 155.80 49.48 357 + .07 
1 
7 8.E.p =0= jnN-1 





SCHOLASTIC ACHIEVEMENT 489 


Table II shows the statistical constants and correlation be- 
tween grade averages for the fall, winter and spring quarters, 
and the raw scores on the American Council Psychological 
Examination. These correlations show the extent of the rela- 
tion between ability and achievement found in this population. 


TABLE III 
Coefficients of Correlation Between Students’ Average Grades and Each 
Original Attitude and Adjustment Score 





FALL WINTER SPRING 


301 201 285 185 251 168 
Men | Women Men | Women|; Men /| Women 








RUNDQUIST-SLETTO: 
.044 .005 .017 .041 .097 .034 
Inferiority .099 .088 .087 .061 107 185 
gt Sree 110 061 071 151 .142 117 
Law .082 .005 .090 .012 123 011 
Economie Conserva- 

tism 197 .217 233 .234 .213 215 
BEGUCataom 2. cccecceceeeen .029 .093 .024 .094 .068 .160 


BELL: 

EE Sa ~ 182 | .284 .120 .221 .068 .168 
057 | .170 .076 .037 .030 .008 
.098 .120 125 .050 .088 .075 
112 .156 .176 .065 125 181 


.103 .253 149 .288 107 .232 
SociaL BEHAVIOR .037 .143 .060 .089 .057 .134 


1 
8.E.9-0- om | +06 |+.07 | 4.06 | +.07 | +.06 | +.08 
Cee we 1 























Table III summarizes the relation between students’ average 
grades and each of the twelve attitude or adjustment scores. 
Both sexes show a clear relation between economically radical 
viewpoints and higher grade records. Women show a ten- 
dency for lower grades accompanying extreme sociability. 
Statistically these are the only two relationships that stand 
out in the data. Beyond this statistical level of significance, 
however, are cerain interesting trends, which time’ prevents 
discussing here. 

With twelve attitude and adjustment scores involved, and 
with the slight relations found between ability and adjust- 








490 JOHN G. DARLEY 


ments or attitudes, and between achievement and adjustments 
or attitudes, the partial correlation technique would seem un- 
wieldy and unpromising as a further statistical step. But a 
relation among these three sets of variables may be found with 
another approach. This approach is again based on the as- 
sumption that the present attitude and adjustment scales 
measure some aspects of behavior that may affect academic 
achievement. 

Consequently, a tabulation was first made, for men and for 
women, of the frequency of occurrence of statistically signifi- 
eant deviate scores on the twelve scales. On the Minnesota 
Seale for the Survey of Opinions, this meant all standard 
deviation scores of 60 or above. On the Bell Adjustment In- 
ventory this meant cases with numerical scores in the ‘‘Un- 
satisfactory’’ and ‘‘Very Unsatisfactory’’ range. On the 
Minnesota Inventories of Social Attitudes standard deviation 
scores of 60 and above and 40 and below are included. These 
distributions are given below: 





Number of Statistically 
Significant Deviate Scores 
































OCBNAAS WKH OS 














For more detailed analysis two groups were chosen from 
each sex: men and women with no statistically significant 
deviate scores; and men and women with four or more statis- 
tically significant deviate scores. Within these four groups at 
the two extremes, the relation between measured ability and 
grade records was studied for the quarter within which the 








SCHOLASTIC ACHIEVEMENT 491 


original attitude and adjustment testing was completed. The 
assumption is this: if measured maladjustment or radicalism 
affects academic achievement adversely, then, other things 
being equal, cases where extensive measured maladjustment or 
radicalism is present should be working below their capacities ; 
and cases free from any measured maladjustment or radicalism 
should be working at a level nearer their capacities. If the 
assumption is correct, the correlations between ability and 
achievement will differ for the extreme groups chosen, being 
higher for the group with no statistically significant attitude 
or adjustment scores. 

Table IV presents the data necessary to test the hypothesis. 

As may be seen in Table IV the hypothesis is still tenable, 
but its statistical verification is complicated by the unexpected 
factor of the hitherto lacking co-variation between ability and 
attitude or adjustment scores in the extreme maladjustment 
group; this finding of the higher ability of the extremely mal- 
adjusted group is probably a function of the local situation. 
It is to be remembered, also, that for each sex both extreme 
groups, as sub-groups, also appear in the statistics for ‘‘all 
men’’ and ‘‘all women.’’ 

The trend of evidence, however, substantiates one part of the 
original hypothesis: the absence of statistically significant 
deviate scores in attitudes or adjustments is accompanied by a 
higher predictive correlation coefficient between ability and 
achievement than is found in the total group. But, for men at 
least, the converse does not hold; men with measured malad- 
justments or radical tendencies show an equally impressive 
correlation between ability and achievement. Both groups of 
men, even though their abilities differ, are working more 
nearly in line with their abilities than is the total group of 
which they are a part. Among the women, the trends are in 
perfect accord with expectations based on the original hypoth- 
esis ; the extreme maladjusted group shows the lowest relation 
between ability and achievement; the adjusted group shows 
the highest relation. 














“e100g AB , 


























86g" go's 00°FS LES 19°99T es 801008 oJeTAep JuBaYTU 
-B18 A[[eo1481}e4s OOM 10 F YR WOO MA—9 
9Ls° srs 8L°8h Th oF €8°SFL oF 891008 0}BTAOP 
quvoyrusis L]TeoTyst7e3s OU YIM WOULOM—F 
0F9" er's S0°es LLP 6F09T On so1008 oyBLAap JuBayTU 
-B1s A[[eorystyejs OLOUI 10 F YIM UO —P 
- 9¢s° Le*L PPOs 6°3P 00°LFT i 3 (a9j2enb pez) wow [Ty¥—zs 
i>) 
7, es9° FOL L8°9F 30°8E 9E°SFT 69 801008 0}8TA 
x -Op quBogrusis A[peoT}8T}8y8 OU YITA WOA—T 
° 
" Busty uve Busts uve 
119N009 
NVOIUGV peo 
NV saavuo i 
naaaiaa Sau00S VAOIS Benak natin ed . 
NOLLVIGUHOO NI quoogu aavup as Teens 




















saso0g juampsnipy powfod wy saunsveRy yuowoanyopy pup fymqy 40f suonvjess0g puv spunjsuog yoousynig 
AI WIaVi 


492 









SCHOLASTIC ACHIEVEMENT 493 


A further check showed a critical difference in median num- 
ber of counseling interviews with men in the maladjusted and 
adjusted groups; no such difference was found in the two ex- 
treme groups of women. This uncontrolled factor may account 
for achievement so in line with capacity as shown by the men 
in the maladjusted group. 

A median of four interviews was held with the women in all 
three groups, and with all men and the men in the adjusted 
group; but a median of six interviews was held with the men 
in the extreme maladjusted group. Since these interviews 
were aimed at diagnosing and treating all student problems, it 
is presumable that they could affect the motivation of these 
men in such a way as to bring achievement more in line with 
capacity, particularly since the men in this group were of 
higher ability, where the goal of transfer to professional train- 
ing also acts as an incentive to achievement. 

While it is impossible to state from these data that measured 
maladjustment and radicalism lead to student mortality, since 
General College students are not dropped under uniform 
standards of deficient work, it does appear that measured mal- 
adjustment or radicalism may depress achievement below the 
level to be expected from ability, unless affected by some coun- 
ter-stimulant. Furthermore, since measured maladjustment 
alone is being considered, clinically isolated problems not 
also isolated by available attitude and adjustment tests may 
logically: be expected to operate in similar fashion to upset an 
expected relation between ability and achievement. One of 
these comes quickly to mind because of its relative frequency ; 
failing work resulting from a choice of curriculum nct con- 
sonant with the individual’s abilities or interests. 














ITEM EVALUATION IN A CIVIL SERVICE 
EXAMINATION* 


NORMAN J. POWELL, Examiner 
Municipal Civil Service Commissioner, New York City 


T is the purpose of this paper to discuss several of the more 
important characteristics of a good test item in a civil 
service examination. The analysis will center about an 

examination given recently in New York City for the position 
of Prison Keeper and will include a discussion of one technique 
of item analysis. 

Among the more important attributes of a good test item are 
the following : 

1. Validity. Items should actually measure occupational 
ability, should discriminate between good and poor workers. 
This means that candidates’ scores in the item under consider- 
ation should show a high correlation with ability in the job for 
which the examination is given. The point here is that sub- 
stantial correspondence of scores in specific items with a recog- 
nized measure of ability signifies that the particular items 
are making a substantial contribution to the discriminational 
capacity of the test. 

2. Reliability. This is defined as the accuracy or consistency 
of measurement. Several methods are available for its mea- 
surement. One procedure, however, for the determination of 
reliability consists of correlating response to the test on a first 
administration with that given on a second administration, 
provided that the two testing periods are well spaced in time. 
The temporal factor is of importance because a gap of a year 
or more, for example, between two successive administrations 

* The writer is deeply indebted to Miss May B. Upshaw, Director of 
Examinations of the New York Civil Service Commission, for her encour- 
agement and advice in the preparation of this report. 


494 














CIVIL SERVICE EXAMINATION 495 


of an examination item may yield differing results as a function 
of variables other than consistency of the measure applied. 
The candidate himself may have changed radically in the period 
of one year with respect to his informational level, for instance. 
On the other hand, separation between the two test administra- 
tions of only a few days may introduce memory effects. The 
point must also be made that a valid test is necessarily a reliable 
test, though a reliable test need not be valid. The measurement 
of weight furnishes an example. A valid measure of weight is 
one which measures the earth’s pull upon a body. A spring 
balance would provide a valid measure and would be reliable 
because successive applications of the measuring instrument 
would give results practically identical with preceding readings 
of the scale. Suppose a footrule to be used in some consistent 
manner as a measure of weight. Since repeated measures 
would give very nearly the same results, the measuring instru- 
ment is, by definition, reliable. It is, however, not valid be- 
cause a footrule does not measure weight and the measure 
yielded would exhibit a decidedly low correlation with another, 
recognized index of weight. 

3. Uniqueness. Separate test items should measure separate 
aspects of the ability in question, %.e., intercorrelations among 
the test items should be low.1 We may illustrate this char- 
acteristic by an extreme case of two items in which candidates 
who give correct responses to one item always give correct re- 
sponses to the second and candidates who answer the first item 
incorrectly always answer the second incorrectly. Obviously, 
there is no need for both items, since a score on one will be 
entirely adequate to predict accurately score on both. 

4. Correctness of key answer. This is a point of consider- 
able practical importance in civil service measurement. Theo- 
retically, in measurement there is no objection to giving credit 
for an incorrect answer and penalizing for a correct answer if 
such a procedure serves to discriminate effectively among can- 


1 That intercorrelations should be low is not true in all cases. Clark 
Hull (Aptitude Testing, World Book Co., 1928, pp. 450-456) gives an 
excellent discussion of this problem. 








496 NORMAN J. POWELL 


didates. This condition would arise if candidates with superior 
ability gave wrong answers to a particular item and inferior 
candidates gave correct answers. The impracticability of this 
procedure nevertheless renders it necessary that credit be given 
only for correct answers. The nature of a correct answer is, 
however, not always obvious, particularly in cases of difficult 
judgment items. Several complications are involved here. It 
is, in the first place, necessary to have difficult items in an ex- 
amination if there is to be any attempt to differentiate among 
superior levels of ability. Second, the less obvious the key 
answer the greater is the number of appeals from ratings. 
Third is the question of who is to be the judge of the correct- 
ness of akey answer. This problem becomes particularly acute 
in items designed to measure reasoning ability, judgment, and 
the like. 

The first and fourth of the attributes listed, validity and cor- 
rectness of key response, will be discussed in some detail. We 
begin with an analysis of the problem of correctness of key 
response and cite the following item numbered 89 in the recent 
Prison Keeper examination in New York City. 

An inmate has just returned a book to the prison library. 
The maximum time for which a book can be borrowed is one 
month. The book contains 241 pages. The inmate is then 
handed a list from which to select another book. He hands the 
list back to the librarian and points to a title saying ‘‘I want 
that.’’ The title he selects is the same book as he has just 
returned. The best inference is that . 

(A) the inmate is interested in reading the book again (B) 
the inmate did not finish reading the book the first time (C) the 
inmate can hardly read (D) the librarian is very efficient. 
Written directions at the beginning of the examination stated 
‘*Four possible answers are suggested to complete each sen- 
tence. One of the answers (A), (B), (C), and (D) best com- 
pletes each sentence.’’ Candidates were required to write on 
a specially prepared answer sheet the capital letter preceding 
the best answer. The key answer was choice (C). A number 
of candidates appealed from their ratings on this item, citing 
most frequently either (A) or (B) as the best answer. 














CIVIL SERVICE EXAMINATION 497 


The immediate question, then, is: How are we to determine 
which of the four alternatives is the best answer? On what 
basis are we to decide whether the key answer is right or wrong? 
One method which seems reasonable and has been widely used 
in psychological test analysis is the submission of the test item 
to competent authority. But what is competent authority? Is 
the typical warden or the professor of criminology and penology 
competent authority? It seems apparent that the criteria to 
be set up for the evaluation of the competence of authority are 
matters for the judgment of the test maker and are matters of 
opinion. It is the opinion of the writer that neither penological 
experience nor information as such entitles a judge to be termed 
competent in this instance. For this item, intelligence is the 
prime attribute for a suitable judge. Of course, penological 
experience is no bar to the possession of intelligence, but the 
point to be made is that it is intelligence and not many years 
of experience which will help a judge to determine the best 
answer. It may be remarked, incidently, that the inclusion of 
such a test item makes the assumption that reasoning ability 
and judgment are desirable attributes of a good prison keeper. 
Whether this assumption holds is not relevant to the question 
of the determination of the best answer to the item, but is 
another problem for which a specific research procedure is 
necessary. 

In addition to intelligence, the competent judge must possess 
a certain minimal fund of information about the assumptions, 
functions and structure of the typical item which he is judging. 
It would be disconcerting to find, for example, that various 
judges to whom a certain test item has been submitted give 
such responses as: ‘‘I would say that short answer questions 
are a satisfactory method of measurement provided the answers 
include one which is clearly, but not too obviously, the right 
answer as well as one very obviously absurd reply.’’ *‘‘If this 
examination is for persons who have had no previous experience 
in this work it is obvious that any answer given to this question 
is based on no experience whatsoever, and probably has little 











498 NORMAN J. POWELL 


or no value.’’ ‘‘None of the four suggested alternatives is 
entirely correct, so that all would have to be rejected.’’ 

These several hypothetical remarks all indicate inadequate 
understanding of the make-up of objective examinations. In 
consequence, the injection by these judges of extraneous, false 
assumptions into their interpretations of the test items invali- 
dates their judgments. The desirable judge, in this case, there- 
fore, is one who is not only intelligent but holds no incorrect 
opinions about tests which may affect his judgment detrimen- 
tally. It is, perhaps, reasonable to require that the competent 
authority be not only of superior intelligence but be well in- 
formed on the subject of testing. The following is one analysis 
of the means by which a judge may deduce (C) as the best 
answer to item 89 of the examination for prison keeper. 

‘The inmate has selected the book again. It is fair to be- 
lieve that, ordinarily, a month provides sufficient time to com- 
plete the reading of a book which is 241 pages long. On the 
other hand, the customary procedure in renewing the loan of 
a library book is to borrow the book again without going 
through the motions of examining a book list. That the same 
book is borrowed again may be laid either to chance or to pur- 
pose. Arguing against chance is the fact that a particular title 
of what may reasonably be supposed to be a list of at least 
several hundred books is chosen. The choice is almost certainly 
purposive. But the technique of selection employed by the 
inmate furnishes data for considering that the inmate can 
hardly read. The inmate points to a title and says, ‘‘I want 
that.’’ It may be that literate persons on rare occasions choose 
books in this way. Certain it is that the statistical probabilities 
are overwhelmingly against such a method of selection by a 
literate individual. Why, then, should the inmate have chosen 
the same book? The inmate is able to read at an extremely low 
level of efficiency. He has been able to recognize a certain con- 
figuration of print and points to the pattern which he has seen 
for a month in his cell. 

‘*We are propelled to the conclusion that the inmate has not 
only failed to finish reading the book but that he is practically 


























CIVIL SERVICE EXAMINATION 499 


unable to read. It is significant that, apart from the fact that 
the inmate has twice borrowed a specific book, the test item 
provides no data to support the hypothesis that the inmate is 
able to read. At no time does the inmate actually read some- 
thing. On the contrary, the several points enumerated argue 
that the inmate can harcly read, if at all. 

‘The evidence against the contention that ‘the inmate did 
not finish reading the book the first time’ is the fact that he had 
a month in which to read a book only 241 pages long and the 
facts already given regarding his behavior, extremely unusual 
for a literate. Inmates in prisons have about four hours of 
leisure on working days, most of Saturday and most of Sunday 
available for reading purposes. At the very least, the inmate 
had about 80 hours in which to read 241 pages. Unquestion- 
ably, this represents an adequate period in which to complete 
the reading of a book in which any inmate is interested, since it 
must be held that the inmate is interested in reading the book if 
he wishes to borrow it again. It may be maintained, however, 
that the book is, perhaps, a text in mathematics or another 
volume of similar difficulty. The probabilities of such a con- 
tingency are nil. The average inmate has had little or no 
schooling. Most books borrowed are fiction. Therefore, it is 
most probable that the book was a story of some kind. Even 
if the book were non-fiction, most of the material borrowed is 
inspirational and swiftly read. It must be borne in mind that 
analysis of the test item must be in terms of probabilities, since 
the most reasonable of the four alternatives is to be chosen. 

‘*There are no data to support the possibility that ‘the li- 
brarian is very efficient.’ In fact, the failure of the librarian 
to inquire into the odd set of conditions described argues for 
a diagnosis of inefficiency. 

‘‘In summary, it is considered that (C) is the best answer 
because there is no evidence to indicate that the inmate is fully 
literate and there is some evidence that he can hardly read. 
The argument that if the inmate can hardly read, it follows 
that he has not finished reading the book, is met by pointing 
out that (C) includes (B), is more specific, and is a better 





500 NORMAN J. POWELL 


answer because more information is furnished. Further, the 
implication of (B) is that the inmate is literate. 

**It is noted that the writer actually witnessed an incident 
of the type portrayed in the test item. Conversation with the 
inmate in the case yielded the information that the inmate was 
practically illiterate. 

‘*Tt is also noted that anyone who has had intra-institutional 
prison experience will testify to the proclivity of many inmates 
for strutting about with a book under the arm, despite an in- 
ability to read. Compensation is a technical term and ‘show 
off’ a popular term for the mechanism involved in such a 
procedure. ’’ 

The following is an example of another item which prompted 
a number of appeals: 


A certain committee found that over 90 per cent of the mur- 
ders in the United States are committed by the use of pistols. 
It follows that 

(A) almost all murders are caused by the possession of 
pistols, (B) 90 per cent of murders ean be eliminated by elimi- 
nating the sale and use of pistols, (C) the pistol is a mechanical 
aid to crime, (D) no information is available with regard to 
the way murders happen. 





In this case, the key answer is (C). Some persons, however, 
were unable to recognize the justification for this answer despite 
an exposition of the line of reasoning which leads to this choice. 
Certainly, it is necessary for proper civil ‘service administra- 
tion that candidates be convinced they are being treated fairly 
and honestly, yet experience indicates that it is well nigh im- 
possible to convince some persons that a particular key answer 
is correct. To argue that candidates do not wish to be con- 
vinced of the incorrectness of their responses merely dodges the 
issue. It seems a fact that this is not true in all cases. Yet 
items which literally invite controversy from failures as to 
correctness are requisite for adequate measurement. Difficult 
items are necessary if the very superior person is to be distin- 
guished from the superior person. The omission of such items 


se eombewe= 








CIVIL SERVICE EXAMINATION 501 


is unfair to the very superior persons and lowers the validity 
of the examination. 

The problem of validity of the test items in this examination 
is that of differentiating satisfactorily between persons who 
would make good prison keepers and those who would make 
poor workers. The question which then arises is: How are we 
to decide which of the candidates in this examination would 
make good prison keepers? Probably the best way to measure 
ability to be a prison keeper is to measure ability on the job. 
We may appoint all candidates and set up criteria with which 
to evaluate their work. We may attempt to rate the keeper’s 
ability to handle inmates, to maintain discipline, to stimulate 
inmates toward good behavior in the institution, and after 
their release, ete. It is obvious that many of the factors upon 
which one would rely in setting about to measure ability on the 
job are themselves not readily susceptible to measurement, nor 
will there be complete agreement on the nature of the factors 
to be set up. Further, it is neither desirable nor possible to 
appoint to the job of prison keeper all those persons who take 
a civil service examination. Past record of candidates may be 
evaluated, but such an evaluation is necessarily crude, since, 
among other reasons, quantity of work of a given type must be 
the prime consideration in rating experience because of the 
virtual impossibility cf obtaining a valid index of quality of 
work. A physical examination will yield a precise score but 
its validity is open to serious question. No specialist in prison 
work would maintain that the modern keeper should be above 
all a person of health and brawn and no one would hold that 
the ability of prison keepers will be measured only by their 
physical efficiency. On the other hand, a written examination, 
unless it has been validated prior to its administration pos- 
sesses serious flaws as a criterion measure. Because of the 
necessity for the utmost secrecy, civil service examinations in 
New York City are not validated by means of an experimental 
tryout before they are given to candidates for positions. A 
combination of ratings in experience and education, physical 








502 NORMAN J. POWELL 


efficiency and performance in a written examination, the con- 
tent of which at least looks as though it provides a fair sample 
of measures of the characteristics and abilities which a good 
prison keeper should have, would seem to be the best criterion 
available under the circumstances with which to evaluate prison 
keepers’ capacity. 

In the present case then, three measures were combined and 
weighted as follows: 


Physical strength and agility test, weight 2 

Training and experience, weight 3 

Written examination, weight 5 
The weights are those which were actually employed in com- 
puting the standing of candidates on the final list of eligibles. 
They were set up by Mr. James J. Flannelly, examiner in the 
Civil Service Commission of New York City after consultation 
with experts in the field of prison work. We make the assump- 
tion that candidates who were highest on the list of eligibles 
would make the most competent workers and that candidates 
who were lowest on the list would make less competent workers. 
We assume that standing on the list is perfectly correlated with 
ability. This assumption is inaccurate to an unknown extent 
because of the fact that the measures employed had not been 
validated. It is, however, the best which can be made and it is 
quite reasonable to expect that there is considerable correspon- 
dence between the ability which is purported to be measured by 
the tests and the measures employed. This assumption seems 
reasonable because the subject matter of the tests looks reason- 
able and pertinent and consumed many hours of testing time. 

We rephrase our problem. It is required to ascertain 

whether persons who stand high on the list tend to answer item 
89 correctly and whether persons who do not stand high on the 
list do not tend to answer this item correctly. The worth of 
the item will vary directly in proportion to its ability to dis- 
tinguish between those who rate high and low in our criterion 
score (the weighted score of each candidate in the combination 
of three measures used). 





CIVIL SERVICE EXAMINATION 503 


The technique for the analysis is statistical and application 
of the formula for the biserial coefficient of correlation yields 
a coefficient of .66 + .08 for a population of 50, the top 25 and 
the lowest 25 on the eligible list of persons who passed the 
examination for prison keeper. 

A correlation of .66 in item analysis is high and indicates 
that item 89 is valid in that it measures substantially the type of 
ability measured by the criterion score. It appears to be rea- 
sonable to employ the value of biserial r in other ways. The 
size of the correlation coefficient may also serve as a check as to 
whether a given response to a test item is or is not the best 
answer. If the best members of a group do not tend to select 
the key answer, the correctness of the key may be open to serious 
question unless the group be composed of low grade persons. 

Biserial r may have another use in civil service measurement. 
Since test items cannot be tried out prior to actual administra- 
tion, it may be desirable to construct an examination which 
ineludes n plus ¢ items where it is desired to have only n items. 
The ¢ items to be discarded may be determined after item analy- 
sis. Such a procedure would tend to increase the validity of 
the examination. 

It is noted, finally, that the consideration of apparent vague- 
ness, ambiguity, and kindred flaws in a test item are more or 
less irrelevant matters where a statistical analysis is available. 
The ultimate test of an item is the answer to the question: 
Does it do what it sets out to do? The item may seem to be 
as vague as the classic question ‘‘ Explain the Universe, giving 
two examples’’ but if the question is effective in separating the 
good from the poor workers the question must be held to be 
good and worth while. It is, of course, true that in measure- 
ment generally and in civil service examining particularly, it 
is best to avoid structural flaws because such flaws in test plan- 
ning decrease validity. On the other hand, what seems'to be a 
defect may be a virtue and the test lies in the statistical and 
mathematical analysis. 





MEMORY EFFECTS FROM POSTER, RADIO 
AND TELEVISION MODES OF ADVER- 
TISING AN EXHIBIT 


FRANK R. ELLIOTT 
Indiana University 


INTRODUCTION 


ECENT tests reported by the author on the relative 
R attention values of poster, radio, and combined poster- 
radio or ‘‘television’’ modes of advertising an exhibit 
have demonstrated the superiority of the three modes in the 
following order: first, television; second, radio; third, poster. 
In a life situation with 25,443 subjects, the experiment showed 
that an exhibit attracted the attention of patrons at a State 
Fair at the following rates: with no advertising, 19.9 per cent; 
with poster advertising alone, 25.4 per cent; with radio adver- 
tising alone, 32.1 per cent; and with poster and radio simul- 
taneously, television mode, 33.4 per cent. 

Now comes the question : What was the effectiveness of these 
various modes of advertising upon memory? Could those who 
saw the poster advertising remember significant facts best? 
Could those who heard the radio advertising remember best? 
Or could those who saw and heard simultaneously remember 
best? How did women’s scores compare with men’s by the 
various modes ? 


PROCEDURE 


The reader who desires complete details of the procedure in 
these tests of attention and memory will find an account in 
the reference’ cited below. Space here permits only a brief 
review of the mechanics of the experiment. 


1 Elliott, Frank R. Attention Effects from Poster, Radio and Poster- 
Radio Advertising of an Exhibit. J. Apri. Psycnon., August, 1937. 


504 





MEMORY EFFECTS FROM ADVERTISING 505 


An exhibit showing the advantages of fresh fruit in the 
human diet was set up at the Indiana State Fair in Indian- 
apolis. The exhibit was presented at nearly all hours of the 
day over the period of one week, under the following condi- 
tions: (1) with no accompanying explanation or advertising; 
(2) with posters explaining the main points; (3) with radio 
explanation of the same points by means of electrical tran- 
scription records and a small public address system; and (4) 
with poster and radio explanations combined in a mode some- 
what resembling television and which for convenience is 
referred to as the television mode. 

Trained psychological observers kept careful count of persons 
entering the exhibit hall and thus coming within range of the 
exhibit and its accompanying advertising. If a person stopped 
and looked at the exhibit, he was counted as giving the atten- 
tion response. If he passed by without stopping, he was 
counted as not giving the attention response. In this way per- 
centage computations showed the relative attention power of 
the exhibit under the four sets of conditions. 

Accepted psychological control procedures were followed in 
equalizing the time for each mode, in rotating the testing 
periods systematically over various hours of the day, in main- 
taining volume control of the radio receiver so that the adver- 
tising could be heard as far as it could be seen. Young and 
old, educated and uneducated, rich and poor—all classes of 
people were tested, and the checking was so concealed that none 
of them knew they were being tested. This is the life situation, 
far removed from the artificiality of the usual laboratory scene. 

In testing for memory after the attention test had been com- 
pleted, it was necessary to provide a special incentive in the 
form of a prize contest to keep the experiment still within the 
frame of the life situation and incidental memory. Persons 
leaving the exhibit hall were handed slips of paper inviting 
them to enter a prize contest by registering at the opposite end 
of the main hall about 160 feet away. Some persons accepted 
the invitation and went to the contest table and registered ; 
others threw their invitations away. 








506 FRANK R. ELLIOTT 


Those registering were asked to answer a questionnaire,—‘‘in 
order to aid the management of the exhibits in selecting effec- 
tive methods of presentation,’’ the participants were informed. 
If a person remembered seeing the fruit exhibit, among the 
other real and fictitious exhibits listed by subject in the recog- 
nition test, that person was carried along to other points in the 
questionnaire in order to test him for details of memory. A 
number was assigned each questionnaire for use in the prize 
drawing held later. Name, sex, and age range of each subject 
were noted. 

Memory responses to the various modes were obtained from 
179 persons taking the test and answering the questionnaire, in- 
cluding 131 women and 48 men. While the final number of 
subjects was not large, statistical techniques? were applied to 
the data to determine the reliability of the differences found in 
the memory scores. 

RESULTS 


Results are shown in the following tables : 











TABLE I 
Radio Versus Poster 
COMPARISONS ALL SUBJECTS WOMEN ONLY MEN ONLY 
Mode Radio Poster Radio Poster Radio Poster 
Number of Subjects... 49 69 28 50 21 19 
Average Score .............. - 1229 7.74 13.79 7.60 10.29 £811 
Advantage 2m Radio= 445  #Radio= 6.19  Radio= 2.18 
Per cent Advantage. Radio=57.5 Radio = 81.4 Radio = 26.9 
Reliability ....................... 100 chances 100 chances 83 chances in 
in 100 in 100 100 





There were six comparisons in all : 


Radio vs. Poster 
Radio vs. Television 
Radio vs. No Advertising 
Poster vs. Television 
Poster vs. No Advertising 
Television vs. No Advertising 
2See Garrett, H. E. Statistics in psychology and education. 1926. 
Longmans, Green and Company. New York. 








MEMORY EFFECTS FROM ADVERTISING 


TABLE II 
Radio Versus Television 


507 






































COMPARISONS ALL SUBJECTS WOMEN ONLY MEN ONLY 
Mode Radio Tel. Radio Tel. Radio Tel. 
Number of Subjects.. 49 45 28 40 21 5 
Average Score ................ 12.29 10.80 13.79 10.60 10.29 12.40 
AAvAMtA ge nncecccecennennen Radio= 1.49 Radio= 3.19 Tel. = 2.11 
Per cent Advantage... Radio=13.8 Radio = 30.1 Tel. =20.5 
Reliability 0.0 ccccco 83 chances in 96 chancesin 71 chances in 
100 100 100 
TABLE Il 
Radio Versus no Advertising 
COMPARISONS ALL SUBJECTS WOMEN ONLY MEN ONLY 
Mode Radio NO Radio N° Radio No 
Number of Subjects... 49 16 28 13 21 3 
Average Score ................ 12.29 7.63 13.79 8.62 10.29 3.33 
Advantage ..............  Radio= 4.66 Radio= 5.17 Radio= 6.96 
Per cent Advantage .. - Radio =61.1 Radio = 60.0 Radio = 209.0 
Reliability ..................... . 100 chances 100 chances 100 chances in 
in 100 in 100 100 
TABLE IV 
Poster Versus Television 
COMPARISONS ALL SUBJECTS WOMEN ONLY MEN ONLY 
Mode Poster Tel. Poster Tel. Poster Tel. 
Number of Subjects... 69 45 50 40 19 5 
Average Score ...... 7.74 10.80 7.60 10.60 8.11 12.40 
AGVAMEABOS nn cecccrceeen Tel. = 3.06 Tel. = 3.00 Tel. = 4.29 
Per cent Advantage. Tel. =39.5 Tel. =39.5 Tel. =52.9 
Reliability ...................._ 99 chances in 98 chances in 84 chances in 
100 100 100 





In Table I the radio memory score is 57.5 per cent higher 
for all subjects, 81.4 per cent higher for women, and 26.8 per 


cent higher for men than is the poster memory score. 


Radio 


outscored television or poster-radio combined, as shown by 
Table II, in two out of three comparisons, those for all subjects 





508 FRANK R. ELLIOTT 























TABLE V 
Poster Versus no Advertising 
COMPARISONS ALL SUBJECTS WOMEN ONLY MEN ONLY 
Mode Poster N° Poster N° Poster NO 
Number of Subjects... 69 16 50 13 19 3 
Average Sore ccc... 7.74 763 7.60 8.62 8.11 3.33 
Advantage ............... Poster= .11 NoAdv.= 1.02 Poster= 4.78 
Per cent Advantage. Poster= 14 #NoAdv.=13.4 Poster=143.5 
Reliability 79 chances in 76 chances in 97 chances in 
100 100 100 
TABLE VI 
Television Versus no Advertising 
COMPARISONS ALL SUBJECTS WOMEN ONLY MEN ONLY 
N N N 
~— - Tel. adv. Te ade. «= Tay. 
Number of Subjects... 45 16 40 13 5 3 
Average Score ........... 10.80 7.63 10.60 862 12.40 3.33 
Advantage ............... Tel. = 3.17 Tel. = 198 Tel. = 9.07 
Per cent Advantage. Tel. =41.5 Tel. =23.0 Tel. =272.4 
Reliability ............  Q99chaneesin 92chancesin 99 chances in 
100 100 100 





Note: In cases where the number of subjects was exceedingly small, 
such as 3 and 5, the reliability calculations shown have little value even 


though the differences are overwhelmingly large. The complete compari- 
sons are shown only with this understanding. 


13.8 per cent and for all women 30.1 per cent. Table III shows 
radio far out ahead of no advertising with advantages of 61.1 
per cent for all subjects, 60 per cent for women, and 209.0 per 
cent for men. 

It will be noted that the differences in favor of radio are 
mainly of high statistical reliability ; that is, there are usually 
100 chances out of 100 that a real difference would prevail if 
the sampling were continued. The reliability formulas used in 
the present calculations are* those commonly used in statistical 
procedure. 


3 See Garrett, H. E. Statistics for Psychology and Education. 





MEMORY EFFECTS FROM ADVERTISING 509 


TABLE VII 
Ranking of Modes 











coarpansest 6p menus NUMBER or TIMES ADVANTAGE 
Radio vs. ‘Poster Radio 3 outof3 
Radio vs. Television 2.0.0... Radio 2 out of 3 
Radio vs. No Advertising ............. Radio 3 outof3 

Total Radio ccc 8 out of 9 
Television v8. Radio .........cccccccccc oe 1 out of 3 
Television vs. Poster .........ccccccccccn Tel. 3 out of 3 
Television vs. No Advertising .... Tel. 3 out of 3 

Total Television ............ . Toutof9 
Poster V8. Radio occ... Poster 0 out of 3 
Poster vs. Television 2.0.0... Poster Ooutof3 
Poster vs. No Advertising ............. Poster 2 out of 3 

Total Poster ........ . 2Zoutof9 
No Advertising vs. Radio .............. No Adv. 0 out of 3 
No Advertising vs. Poster ............. No Adv. 1 out of 3 


No Advertising vs. Television .... No Adv. 0 out of 3 
Total No Advertising ..... 1 out of 9 





Summary: 
Radio held advantage in 8 out of 9 comparisons. 
Television held advantage in 7 out of 9 comparisons. 
Poster held advantage in 2 out of 9 comparisons. 
No Adv. held advantage in 1 out of 9 comparisons. 


When we compare television with poster, shown in Table IV, 
we see that the advantage is uniformly in favor of the former 
by percentages of 39.5 per cent, 39.5 per cent, and 52.9 per cent; 
and that the reliability of the difference is high (99 and 98 
chances out of 100) for all subjects and for women. 

It is rather striking from Table V that poster for all subjects 
shows only a 1.4 per cent advantage over no advertising at all. 
It is even more striking that with women poster advertising is 
actually 13.4 per cent worse than no advertising at all, although 
for men alone this advantage is wiped out with a 143.5 per cent 
advantage for poster. The reliability of the difference is low 
for all subjects and for women. 


ow 


iene 2 

















510 FRANK R. ELLIOTT 


In Table VI we see that television always substantially sur- 
passes no advertising ; the percentages are 41.5, 23, and 272.4; 
the reliability of the differences is fairly high, 99, 92, and 99 
chances out of 100. In checking back through the data, we see 
that only poster advertising failed to surpass no advertising at 
all and that radio and television both were far more effective 
than no advertising at all. 

The summary in Table VII shows the rankings of the three 
advertising media for men and women separately and com- 
bined and in all possible modal cross comparisons. 

Radio outranks poster, television and no advertising in eight 
out of nine possible cases and the percentages in favor of radio 
are usually very large. In six out of the eight favorable show- 
ings, the difference in favor of radio is statistically reliable. 

Television outranks the other modes in seven out of a pos- 
sible nine cases, with four of the seven cases showing a statis- 
tically reliable advantage. 

Poster outranks the other three modes in only two out of 
nine cases and in one of those two cases the advantage is in- 
significant, 1.4 per cent, and the statistical reliability is very 
low. 

The no advertising mode wins in only one out of nine cases 
and the result is statistically unreliable, thus demonstrating the 
superiority of any kind of advertising used in this experiment 
over no advertising at all. 

One interesting finding in this study is the advantage which 
women show for radio over poster and poster-radio or television 


mode in comparison with the advantage shown by men, i.e., 81.4 


per cent for women vs. 26.9 per cent for men in Table I; and 
30.1 per cent for women vs. 20.5 per cent for men in Table II. 
This is consistent with the results of a further study in the 
present series by the author, to be presented later in this jour- 
nal, demonstrating that women show a far greater advantage 
of ear memory over eye memory for fictitious trade names 
than do men. 

Such findings seem to have importance to advertisers since 








MEMORY EFFECTS FROM ADVERTISING 511 


women are recognized as buyers for the home. Surveys* have 
shown that women spend much more time listening to the radio 
than do men. This may have habituated women to radio ad- 
vertising with the result, as shown here and in the study to 
follow, that women remember better by radio than men remem- 
ber by radio. 


SUMMARY AND CONCLUSIONS 


This is a study in the life situation of memory response to 
poster, radio and television advertising. An exhibit was pre- 
sented to a State Fair audience with no accompanying explana- 
tion and at consecutive periods in rotation with poster, radio, 
and poster-radio or television modes. A group of 179 subjects 
selected at random answered a questionnaire showing whether 
or not they remembered the exhibit and how many details they 
remembered. Average scores were computed for the various 
modes, for both sexes together and for male and female sexes 
separately, and the reliability of the differences was calculated. 
Cross comparisons were then made, each of the four modes with 
the other three for all subjects combined and for each of the 
sexes separately. 

The results show that radio held the advantage in 8 out of 9 
cross-comparisons ; television in 7 out of 9; poster in 2 out of 9; 
and no advertising in 1 out of 9. 

Radio surpassed no advertising by 61.1 per cent, poster by 
57.5 per cent, and television by 13.8 per cent—in the compari- 
sons for all subjects. The chances in 100 for a reliable differ- 
ence were 100, 100 and 83, respectively. 

Television outranked poster by 39.5 per cent and no adver- 
tising by 41.5 per cent, with 99 chances in 100 for each com- 
parison that the difference was a reliable difference. Tele- 
vision was surpassed, in the comparison for all subjects, only 
by radio and the difference there was comparatively: small 
(13.8 per cent) and unreliable (83 chances in 100 of a reliable 
difference). 


* Cantril, Hadley, Allport, Gordon W. The Psychology of Radio. 1935. 
Harper. New York. 











512 FRANK R. ELLIOTT 


Poster mode held the advantage only in one comparison for 
all subjects and that with the no advertising situation, and here 
the advantage was negligible (1.4 per cent) and the difference 
highly unreliable (79 chances in 100). 

In all comparisons, the no-advertising situation suffered, thus 
indicating that any kind of advertising used enhanced memory 
for the exhibit. 

Sex comparisons showed women scoring relatively higher for 
radio mode than men and the suggestion is made that this may 
be due to the fact, as shown by surveys, that women listen more 
to radio than do men. 








THE RELATIVE SIGNIFICANCE OF MEASURES 
OF MECHANICAL APTITUDE, INTELLI- 
GENCE, AND PREVIOUS SCHOLAR- 
SHIP FOR PREDICTING ACHIEVE- 
MENT IN DENTAL SCHOOL 


ALBERT J. HARRIS 
College of the City of New York 


ENTISTRY is a profession in which mechanical skill 
La and theoretical knowledge are both required. The 
problem of predicting success in dentistry offers an 
opportunity to weigh the possible value of mechanical apti- 
tude tests in a complex professional field of work where 
mechanical aptitude would seem to be a prerequisite for suc- 
cess. This paper presents evidence on the usefulness of sev- 
eral measures of mechanical aptitude in predicting success in 
dental school, in comparison with intelligence test scores and 
records of previous scholarship. 


THE PRELIMINARY STUDY 


An announcement of the aim of the study was made to the 
members of the first year class at the Tufts College Dental 
School. Fifty students, approximately two-thirds of the class, 
volunteered and took the tests during March and April, 1931. 

The tests used were Worksamples Nos. 5, 16, and 17(2) 
generally known as the Wiggly Block Test, Finger Dexterity 
Test, and Tweezer Dexterity Test, and a Hand Steadiness 
Test. The Wiggly Block Test is a rectangular block of wood 
made up of nine irregularly cut pieces. The pieces are’taken 
apart before the subject and shuffled. The score is a weighted 


1 The unfailing cooperation of the administrative officers of the school, 
the late Dr. William Rice and Dr. Howard Y. Marjerison, is gratefully 
acknowledged. 


513 








514 ALBERT J. HARRIS 


average of the times taken to assemble the block on three trials. 
The test is intended to measure perception of three-dimensional 
shapes and spatial relationships. Its reliability has recently 
been reported as being only .68(3). The Finger Dexterity 
test is a metal plate with 100 holes punched in it. The task 
is to pick up three small metal pins at a time with the fingers 
of one hand and insert them in one of the holes. This is 
repeated until all holes are filled. Scoring is in terms of time 
taken. The Tweezer Dexterity test is similar except that the 
holes are smaller, and are filled with one pin each, using 
tweezers. Scoring is also in terms of time. The reliabilities 
of these two tests are reported as in the eighties. The Hand 
Steadiness Test used employs an inclined metal plate with 
holes of various sizes, a metal stylus, and an electric counter. 
The score is a weighted average of the number of contacts 
made in four 30 second insertions of the stylus, (two trials in 
each of two medium sized holes) and the number of unsuc- 
cessful trials at inserting the stylus into and withdrawing it 
from each of the holes, on two trials. The reliability of this 
test, determined from the present results, is .67. 

The tests were administered individually to each subject by 
the writer. The time necessary for administering the battery 
of tests to one subject ranged between 30 and 45 minutes. 

The criterion used in this preliminary study was the mean 
grade received by the student in the first year courses in Dental 
Anatomy and Dental Technology. These two courses were 
selected because they involved more manipulation and appre- 
ciation of spatial relationships than the other first year courses. 

Table I gives the means, standard deviations, and standard 
errors of the criterion and the four tests. The performance 
of the group as a whole was below average on the Wiggly Block 
Test, and markedly above average on the Finger and Tweezer 
Dexterity Tests. O’Connor’s norms, based on applications for 
employment at the General Electric Company, are given in 
letter ratings corresponding to quartiles, A being the highest 
quartile and D the lowest. Letter ratings for the students 








MEASURES OF MECHANICAL APTITUDE 515 


TABLE I 


Constants of the Distributions of Average Grades and Four Mechanical 
Aptitude Tests for 50 First Year Dental Students 








VARIABLE | MEAN AND S8.E. S.D. AND 8.B. 
1, Apes ones... 75.48 + .405 2.861 + .286 
1. Wiggly Block Test ......0..0ccccc 4.882 + .299 2.115 + .212 
2. Finger Dexterity Test. .................. 3.9384 + .095 .674 + .067 
3. Tweezer Dexterity Test .................. 4.857 + .816 577 + .058 
4. Hand Steadiness Test ....................... 16.34 + 1.179 8.34 + .834 





were as follows: on the Wiggly Block Test, 4 As, 11 Bs, 21 Cs, 
and 14 Ds; on the Finger Dexterity Test, 32 As, 12 Bs, 2 Cs, 
and 4 Ds; and on the Tweezer Dexterity Test, 35 As, 9 Bs, 
4 Cs, and 2 Ds. 

The correlations between the four tests and the criterion 
are shown in Table II. The correlations between the criterion, 
the Wiggly Block Test, and the Hand Steadiness Test are too 
low to be considered statistically significant. The correlations 
between the Finger and Tweezer Dexterity Tests and the cri- 
terion are statistically significant, but low, —.395 and —.361.* 


TABLE II 


Intercorrelations Between Average Grades and Four Mechanical Aptitude 
Tests, for 50 First Year Dental School Students 











VARIABLE I 1 2 3 4 
I, Average Grade ............... -.139* -.395 -.361 -.194 
1. Wiggly Block Test ...... - .139 +.080 +.167 +.167 
2. Finger Dexterity Test | -.395 +.219 +.337 +.097 
3. Tweezer Dexterity Test | -.361 +.080 + 337 + 587 
4, Hand Steadiness Test . | -.194 +.167 +.097 +.587 





* When N = 50, the probability that a similar correlation might occur by 
random sampling from an uncorrelated population is less than 1 in 100 


for correlation above .35 (1, p. 549). 

Since the correlation between these two tests is also low, the 

combination of the two tests by means of multiple correlation 

offers the possibility of predictions more accurate than those 

obtainable from either test taken separately. The multiple 
2The negative correlations for all of the four tests are due to the 


methods of scoring used, in which high scores indicate poor performance 
and low scores indicate good performance. 











516 ALBERT J. HARRIS 


correlation between the criterion and these two tests was found 
to be .465, with a standard error of estimate of 4.32. Predic- 
tions from such a multiple correlation are only 12.5 per cent 
better than predictions made on a pure chance basis. 

The mechanical aptitude tests used did not show much 
promise in predicting success in the first year courses. How- 
ever it was decided to continue the investigation on a more 
comprehensive scale. For one thing it was thought that the 
criterion used was possibly inadequate, since it was based on 
only two first year courses. The possibility existed that 
mechanical aptitudes of the kinds measured by the tests might 
become more important as the students continued in dental 
school and the work became more practical. Furthermore, 
it was believed that intelligence and previous scholastic 
achievement would show significant relationships with success 
in dental school, and that the combination of these two factors 
with the mechanical aptitude tests might show a satisfactory 
high relationship with dental school work. 


THE MAIN STUDY 


Arrangements were made to administer a battery of tests to 
all members of the class entering the dental school in October, 
1932. A new test, called the Cube Carving Test, was added 
to the battery used in the preliminary study. In this test each 
student was given a 1 cc. cube of hard wax, a carving tool, and 
a guage, and told to carve out of the wax as nearly perfect a 
sphere as he could. The test was administered by the two 
instructors in the Dental Anatomy course as the first labora- 
tory exercise, with a time limit of one hour. The products 
were rated by each instructor on a five point scale and the two 
ratings were summed, so that scores ranged from 2 (best) to 
10 (worst). The Cube Carving Test was added to the battery 
partly because the carving of wax models is important in 
dental laboratory work, and partly because the test was already 
being used in one well-known dental school as an elimination 
test for candidates for admission. 





MEASURES OF MECHANICAL APTITUDE 517 


The Otis Self-Administering Test of Mental Ability, Higher 
Examination, Form B was administered to all of the students 
with a 20 minute time limit, during one of the regular school 
periods. Complete records of the students’ collegiate studies 
were available in the school files. From these records two 
scores were derived. One, called the Pre-Dental Total Aver- 
age, is the mean of the grades received by the student in all of 
his college courses. The other, called Pre-Dental Science 
Average, is the mean of the grades received by the student in 
college science courses only. 

Two criteria were selected for use in this study. One is the 
mean of all courses taken during the first year of dental school. 
The other is the mean of all courses taken during the four years 
of dental school. In the cases of students who were dropped 
from school or resigned, the mean of all courses taken up to the 
time of leaving was used for the four year average. 

The possibility of correctly evaluating the usefulness of pre- 
dental scholarship was increased by the decision of the school 
administration to admit, on an experimental basis, a few stu- 
dents with poor grades in their pre-dental work. 

All 67 members of the class were tested individually by the 
writer during the first three weeks of October, 1932. One stu- 
dent, whose preparation had been made in Europe and whose 
mastery of English was imperfect, was not included in the 
study. Of the 66 students retained, 59 graduated and 7 were 
either dropped or resigned after being warned of probable 
failure to graduate. 


RESULTS 


The means, standard deviations, and standard errors of all 
the variables involved in this study are presented in Table III. 

This group of students, like the group used in the prelimin- 
ary study, was below average on the Wiggly Block Test and 
markedly above average on the Finger Dexterity Test. The 
letter ratings obtained were: Wiggly Block Test, 10 As, 17 Bs, 
25 Cs, and 14 Ds; Finger Dexterity Test, 35 As, 14 Bs, 12 Cs, 








518 ALBERT J. HARRIS 


TABLE III 


Constants of the Distributions of Scholarship, Several Mechanical Apti- 
tude Tests, Intelligence, and Pre-Dental Scholarship for 66 Dental 











School Students 

VARIABLE MBEAN AND 8.5. 8.D. AND 8.B. 

I, Four Year Average Grade ......... ‘ 78.53 + .673 5.466 + .476 
II. First Year Average Grade .......... 78.85 + .493 4.007 + .349 
1. Wiggly Block Test ..ccccccccccn 5.074 + .308 2.443 + .213 
2. Finger Dexterity Test ................ 3.982 + .076 615 + .053 
3. Tweezer Dexterity Test. ........... a 4.923 + .067 540 + .048 
4, Hand Steadiness Test ............... 12.909 + .954 7.751 + .674 
5. Cube Carving Test ................. 6.106 + .329 2.687 + .234 
6. Otis a ND pmcsiticiicnisnn 53.439 + 1.159 9.419 + .819 
7. Pre-Dental Total Average .......... 76.515 + .574 4.668 + .406 
8. Pre-Dental Science Average ..... 75.318 + .657 5.349 + .466 





and 5 Ds; and Tweezer Dexterity Test, 43 As, 14 Bs, 7 Cs, and 
2 Ds. This group does not differ significantly from the group 
used in the preliminary study on any of the four tests. 

The intercorrelations between all the variables are given in 
Table IV. None of the correlations between the two criteria 
and the five mechanical aptitude measures is reliably greater 
than chance. The Finger Dexterity Test, which showed the 
highest relationship with the criterion in the preliminary 
study, has an r of —.098 with the First Year Average, and an 
r of +.151 with Fou: Year Average, indicating a very slight 
tendency for those who did poor work on the test to get better 
than average grades. Of the five tests, the highest correlation 
with the criteria is found in the Tweezer Dexterity Test, but 
its highest r, —.171 with First Year Average, is not reliably 
greater than zero. 

Significant correlations with the criteria are found for the 
Otis Intelligence Test and the two Pre-Dental Averages. With 
the First Year Average the correlations are .552 for Otis, .440 
for Pre-Dental Total Average, and .410 for Pre-Dental Science 
Average. With the Four Year Average the correlations are 
.356, .528, and .588 respectively. The Otis has a higher rela- 
tionship with the first year grades than with the four year 
grades while the Pre-Dental Averages correlate more highly 
with the Four Year Average. 





“(gpg “d ‘T) Lg" OAOqe SuOT}B[OLI09 IOZ HOT Ut [ UBY} S80] St UOTZET 
-ndod pozeperiooun uv woiy Zuydures wopuva 4q moo0 yy8rm uoNepess09 reps B yey} AZYIqQeqoad oy) ‘g9=N woyM, 





9l8°+ 9S0'+ 680°- e82'- T80'- 260°+ 880°'- OTF + sgt eSvI0Ay Souslog [Bue -e1g 
918° + 160°+ OST’- 292°- 860°- TLI'+ Sle + OFF + 8BBG"+ eBvs0ay [BO], [VyUeq-e1g 
9c0°+ T60°+ cgo'+ T00°'- STO°'+ 800°'- 6FI'- sgS'+ goEt ~“———" $97, ooUESTIOIUT BIO 
680°-  OST'- sgo't Olo’+ SIs'+ TZ0+ 6r+  e90°- BI0'+ 5 480], Burarwy eqnyp 
e8a- 392'- 100°- OTO'+ viv + LIT+ Se0'- L43I'- FF0'- 480], SSOUTPYe}g puByy 
1s0°-  360°- SIO'+ SsIe'+ PIP + elo + LIT’ + IZ - ¥0r'- 9801, 44140}x0q] Jez00M J, 
460°+ TZT'+ 800°- TLI'+ ZIt+ BLg"+ 1Sc'+ 860°-  ISI'+ qo, 44140}x0q JoBuLy 
S80'- Sle + G6rI'- S6I'+ S80'- LLT'+ Lget Lg - 1280°- 7 3807, yoorg A[SIrM 
Oly + OFF + BoS't+ E90°'- LZeI'- ILI'- 860°- JZgI'- 2l6° + OBvioAY IVOX AIL 
sss'+ S8ZS'+ gcs'+ ZIO'+ FF0'- PFOI'— ISI'+ 120°- «326°+ esvsay 1vexX InOg 


8 L 9 g ¥ £ z T II @IadVIuva 


spuepnig JooYyoY ywUeq 99 40f dryssvjoyog 
10jusq-24g puv ‘souebyjoquy ‘sjs07, opnyydp ywoowupyoay ‘ebvs0ap avez ysug ‘ebvs0ap svaX snog woeMjeg 8u01)D104100402UT 


AI WIadvi 


ie kn kth -h-3 a) 


_ 











5 
z 
: 








520 ALBERT J. HARRIS 


The five mechanical aptitude tests show in general positive 
but low correlations with each other. The generally positive 
trend of the correlations is probably significant, (10 of the 11 rs 
are positive), but only three of them are reliably greater than 
chance. Of the five tests, the Tweezer Dexterity Test has the 
highest intercorrelations with the others, having rs of .572, 
.414, .318, and .117 with the Finger Dexterity, Hand Steadi- 
ness, Cube Carving, and Wiggly Block tests. This order of 
decreasing relationship corresponds to what one would expect 
from comparison of the responses required on the five tests. 
The correlations between the five tests and the Otis Intelligence 
Test cluster around zero, and all except one of the ten correla- 
tions with pre-dental scholarship are also close to zero. 

The correlations between intelligence, pre-dental scholarship, 
and the two criteria were examined for multiple correlation. 
Since the Otis test has very low correlations with the two mea- 
sures of scholarship, it was naturally included. The two mea- 
sures of pre-dental scholarship correlate highly with each 
other, .876, so little will be gained by using both. The Pre- 
Dental Science Average was selected because it has a higher 
correlation with the Four Year Average and a slightly lower 
correlation with Otis. The multiple correlation coefficients and 
corresponding standard errors of estimate are given in Table V. 


TABLE V 
Coefficients of Multiple Correlation and Standard Errors of Estimate for 





Predicting Four Year Average and First Year Average for 66 


Dental Students 

















VARIABLE R AND 8.E.p “ae 
Rie” 670 + .069 4.06 
oa0 676 + .086 4.02 
-e .670 + .069 2.97 
8 690 + .066 2.90 














* The symbols refer to the variables named in Table IV. 


‘'ne combination of Otis score and Pre-Dental Science Average 
gives Rs of .670 with both criteria. The addition of the 
Tweezer Dexterity Test improves the multiple correlation a 














MEASURES OF MECHANICAL APTITUDE 521 


negligible amount, .006 in one case and .020 in the other. Since 
the prediction of dental school work from previous scholarship 
alone is only 19 per cent better than chance, and prediction 
from the combination of intelligence score with pre-dental 
scholarship is 26 per cent better than chance (1, p. 551), there 
is a worth-while increase in the accuracy of prediction to be 
gained from the use of the combination. The multiple correla- 
tion of .670, while not high enough for accurate prediction of 
individual success or failure, is as high as most multiple corre- 
lations reported in the literature on aptitude testing. 


SUMMARY AND CONCLUSIONS 


A preliminary study of the relationship of four mechanical 
aptitude tests to First Year achievement in dental school gave 
results justifying a more intensive study. In the main study 
five mechanical aptitude tests, the Otis Intelligence Test, and 
two measures of pre-dental scholarship were correlated with 
scholarship in the first year and in the four years of dental 
work, using a group of 66 students. The mechanical aptitude 
tests, including one that has been used as an entrance test by 
a large dental school, all gave correlations of negligible size 
with dental school scholarship. Intelligence and pre-dental 
scholarship both showed substantial relationship to the criteria, 
and when combined gave a multiple correlation coefficient of 
.670 with dental school marks. The increase in accuracy of 
prediction obtained from the use of the combination, over pre- 
diction from pre-dental scholarship alone, was of worth-while 
size. 

REFERENCES 
1. Guitrorp, J. P. Psychometric Methods, New York: McGraw-Hill 
Book Company, Inc., 1936, pp. xvi and 565. 
2. O’Connor, J. Born That Way, Baltimore: The Williams & Wilkins 
Company, 1928, pp. 323. : 
3. Remmers, H. H., and Smirn, J. M. Reliability and practice effect 


in the O’Connor Wiggly Block Test, Journal of Applied Psy- 
chology, 1936, 20, 591-598. 








VALIDITY AND INDEPENDENT CRITERIA IN 
TESTS AND RATINGS 


Z. C. DICKINSON 
University of Michigan 


HE concepts named above are of course quite funda- 
gt mental in mental measurements generally. In the fol- 
lowing paragraphs an attempt is made to restate their 
natures and relations in simple terms, and especially to empha- 
size the point that any measurement may be viewed from either 
of these standpoints, as a test or scale which is to be validated, 
or as an instrument which might be used in validating some 
other measuring rod. 

Let us begin with a survey of some connotations of the term 
‘‘objective,’’ in connection with tests and scales. A test is said 
to be relatively objective when its elements and procedures are 
so explicitly defined that the scores made in it are not much 
affected by variable moods and leniencies of the examiners who 
give or score it. An examination, for example, of the ‘‘true- 
false’’ variety may be more objective in this sense than a test, 
with similar subject-matter, of the completion type—having 
blank spaces in which the examinee writes the missing words, 
numbers, or phrases; be. ‘se in the latter sort of test the right 
answers may be approached more closely than in the former, 
by responses which are almost but not quite right. 

We may notice here also that objective mental measurements 
fall into two general categories, namely (1) the simpler quan- 
titative series, such as number of taps per second or number of 
seconds per tap; and (2) what might be called qualitative 
series, or, as they are commonly denoted, specimen scales. In 
such a specimen scale, the unit tests appear on the surface to 
be heterogeneous, but they are nevertheless thought to indicate, 


522 








CRITERIA IN TESTS AND RATINGS 523 


or be symptomatic of, progressive steps in a line of somewhat 
homogeneous abilities. Examples are Binet scales, handwrit- 
ing scales, and trade tests. An objective spelling test is also 
a specimen scale and it is easy to score such a test once the 
words in it have been chosen. The term ‘‘scale’’ applied to 
ratings of persons in various traits of character or performance 
indicates that this type of instrument, too, is an outgrowth of 
efforts like those of Binet and Thorndike, to measure traits by 
means of standard series of behavior samples. 

The degree of objectivity of a test, however, tells us little of 
its significance or validity. We might, for instance, stand- 
ardize a procedure for stringing beads with a needle and yarn, 
yet be at a loss to demonstrate what other trait or ability or 
aptitude, if any, varies directly or inversely with capacity in 
our bead-stringing. The process of finding and proving such 
significance is called validation. No quick or conclusive dem- 
onstration is likely to be possible; each version of each test 
must run a gauntlet of criticism, including attempts to show 
statistically in what ways it is and is not meaningful. One 
query to be made in this connection concerns its ‘‘reliability,’’ 
in the technical sense of self-consistency. If it is shown sta- 
tistically, for example, that each of a large number of persons 
makes about the same score each time he takes the test—apart 
from the improvement to be expected from practice—then we 
have an important evidence of that test’s reliability. 

The test’s significance has to be further shown, moreover, by 
reference to independent (or ‘‘objective’’) criteria, to which 
notion we must now give attention. (‘‘Objective’’ is here used 
in a sense somewhat different from that discussed above.) 
Such an independent or objective criterion is a different means 
of ranking the same subjects who take the test; for example, 
they may be ranked according to their respective outputs, if 
these subjects are, or become, piece workers. To the éxtent 
that those people who make high scores in a standard tapping 
test are shown statistically to be actually or potentially better 
typists than those who are poor tappers—to that extent this 














524 Z. ©. DICKINSON 


particular tapping test is validated as a test of typing aptitude 
by independent criteria of typing productivity. 

How may such comparisons be made to validate an aptitude 
test? Three stages, or types of procedure, may readily be dis- 
tinguished. One much used in the past consists in giving the 
test only to experienced workers in the occupation concerned 
in order to ascertain the degree of correlation—i.e., whether, 
on the average (and usually within what limits), those work- 
ers who are most and least capable according to independent 
practical criteria are also, respectively, the best and worst per- 
formers’in the test. A second stage or method (logically) is 
a variation of the first. It consists in comparing the test scores 
of practicing members of the occupation with scores made by 
non-members—that is, by people who are similar to the voca- 
tional group in all respects except occupation. If the mean 
score of the practitioners is significantly different from that of 
the ‘‘population’’ from which the members are recruited, there 
is a good chance that the test reveals aptitude for the occupa- 
tion. To what extent it does measure mere aptitude apart 
from abilities which are acquired by experience in the trade is 
a question best answered by the third stage or method of vali- 
dation, which consists in giving the test to novices before such 
vocational experience, and then waiting to see whether they 
develop practical ability (at least within limits) in proportion 
to their performances in the test which is undergoing valida- 
tion research. This last procedure obviously requires more 
time than the first two; and it is generally difficult to secure, 
over the requisite months or years, a significant number of 
eases in which both the tentative test for aptitude and the inde- 
pendent criterion or criteria of validity are applied in suitable 
and standard fashions. 

Such are the stark and commonplace outlines of this concept, 
validity, with reference to mental measurements. In practice, 
of course, researches in validation become extremely complex, 
as ‘‘batteries’’ of tests and of criteria are investigated, “‘criti- 
eal scores’’ are found, and so on. Book, for example, gave a 


CRITERIA IN TESTS AND RATINGS 525 


tapping test to seven ‘‘ world’s champion’’ typists, who could 
write 135 words a minute or faster, and found that all could 
tap much faster than other persons of their ages. Kitson, 
however, discovered later that his own samples of women typists 
and pianists, who were apparently representative of people 
earning their livings at these vocations, did not perform sig- 
nificantly better in the tapping tests than a control group of 
other women (not typists or pianists) of the same ages. He 
inferred that this test, while it may be able to indicate cham- 
pionship aptitude for typing, does not diagnose aptitude for 
acquiring ordinary competence in either typing or piano- 
playing. These researches illustrate the notion of limits 
within which a measurement may have validity. 

I wish now to suggest that the concept validity has a broader 
significance for all manner of tests than it is commonly given 
by the specialized researchers and the textbook writers. Its 
scope is by no means confined to aptitude or prognostic tests. 
On the contrary, output measurements, rating scales, trade 
tests, all may be anvils as well as hammers in the complex proc- 
esses of validation. From each important scale, test, or other 
measurement, we may look in one direction (let us say ‘‘up- 

1H. D. Kitson, ‘‘ Determination of Vocational Aptitudes.’’ Personnel 
Journal, vol. 6 (Oct. 1927), pp. 192-198. See also Max Freyd, ‘‘ Selection 
of Typists and Stenographers: Information on Available Tests.’’ Jbid., 
vol. 5 (Apr. 1927), p. 491. A ‘‘word’’ in typing tests, is usually defined 
as five strokes, whether of letters, other characters, or space-bar; a penalty 
of, say, ten ‘‘words’’ being deducted for each error as defined in the 
instructions. The individual’s score in a typing test, of course, is affected 
also by other factors, such as the equipment, the degree of familiarity with 
the text or vocabulary, and relative premiums for speed and accuracy. 

With respect to many a prognostic test a minimum ‘‘ critical score’’ may 
be found, below which the applicant is a very poor risk for the job. Then 
may follow a limited range, within which the better his score, the better is 
he likely to perform in the occupation. But there may be also a maximum 
critical score, above which he will become a poorer risk than somewliat be- 
low. A certain minimum of linguistic and abstract intelligence, for ex- 
ample, is essential in a typist; but beyond some higher point it becomes a 
handicap, since the very bright person tends soon to become dissatisfied 
and bored with a simple repetitive job. 











526 Z. ©. DICKINSON 


ward’’), toward the uses which this measurement may have. 
Otherwise expressed, this is the general direction in which we 
seek materials for validating the test in question. 

This test might be, for example, a tentative trade test for 
typists; in which ease ratings by supervisors or instructors or 
associates, and production records, would be among the more 
promising independent criteria. Or, from this same tentative 
test or scale we may look in the opposite direction (‘‘down- 
ward’’), for materials that would help to identify such ability 
or trait in its more incipient forms—e.g., for prognostic tests 
of typing aptitude. And going one step ‘‘downward,’’ we 
may observe that an appropriate objective test in spelling is a 
prognostic sign of aptitude for becoming a commercially suc- 
cessful typist; while no doubt numerous measurements have 
already been discovered which are somewhat prognostic of 
aptitude for learning to spell.? 

Hence any important human measurement is likely to be 
both a proficiency test and an aptitude test, depending on the 
point of view. The process of demonstrating that it is a use- 
ful measure of achievement or performance or matured ability 
we may call its own validation, with respect to the ability de- 

2H. ©. Link, Employment Psychology, pp. 90, 91, 415, 416, gives the 
record of a pioneer application of a spelling test toward determination of 
typing aptitude or ability. Much fuller data, with reference to numerous 
subjects, are given by B. Muscio and 8. C. M. Sowton, who found that 
scores in the spelling test used by Link correlated higher with typing per- 
formance than nearly any other typing prognostic test they tried. (‘‘ Vo- 
cational Tests and Typewriting.’’ British Journal of Psychology, vol. 13, 
(1923), pp. 344-369. 

I have not attempted to ascertain what measurements of spelling apti- 
tude, if any, have hitherto been validated. A sidelight on the problem, 
however, is afforded by 8. A. Courtis’s discussions of maturation or spon- 
taneous growth, in children, of spelling, as well as many other traits. See, 
e.g., his ‘‘Growth and Development in Children,’’ in Advances in Health 
Education (American Child Health Association, New York, 1934), esp. pp. 
184, 188. Apparently a child’s performances in a standard spelling test, 
at intervals, if the natures of his teaching and study and practice are 
known, will establish (through calculation of ‘‘isochrons’’) a high prob- 
ability as to his total aptitude for learning to spell correctly. 





CRITERIA IN TESTS AND RATINGS 527 


fined for this purpose. And, so far as it is thus recognized as 
a measure of proficiency, it will serve as an instrument for 
validating or invalidating aspirants for the réle of prognostic 
tests for detecting, while it is merely a potential capacity, that 
aptitude which the achievement or proficiency test measures 
on its mature and developed level. The same examination- 
routine, in the foregoing illustration, may be at once an achieve- 
ment test in spelling and a prognostic test of aptitude for 
becoming a typist. It is a mistake to think that prognostic 
tests of aptitude are necessarily restricted to abilities or charac- 
ters which are entirely inborn, or independent of any specific 
sort of experience. 





VARIABILITY IN AUTOMATIC MENTAL 
PERFORMANCE WITH UNIFORM 
INTENT 


DOUGLAS FRYER 
New York University 


ONSCIOUS intention of the worker to work at a dif- 
at ferent rate from time to time has been found to accom- 
pany all significant changes in speed of performance in 
automatic work.’ Yet it would seem that performance over a 
period of time might be variable in slight degree without any 
clear intention of the worker to change his rate, because of 
degrees of automaticity in the working situation. 

The investigation reported here was concerned with the 
extent of these so-called unintended changes in rate of per- 
formance, or variability, in a familiar mental-manual task 
where the worker intends to work at a comfortable rate. The 
investigation was not concerned with variability where the 
worker starts the task and allows it to go unguided except on 
occasions. Nor was it concerned where performance is guided 
by clear conscious intent not to change in rate, while results 
for this condition are reported from a check experiment which 
was introduced for comparative purposes. It dealt with the 
condition where workers intend only to maintain a comfortable 
rate and it measured changes in performance for equal inter- 
vals where this intent was uniform throughout a period of 
work. 

I. PROCEDURE OF EXPERIMENT 


The author’s Speed Addition Test? was administered as the 

1 See ‘‘A Genetic Study of Motivation Under Changing Auditory Situ- 
ations.’’ Brit. J. of Psychol., 1934, XXIV, 408-433 ; ‘‘ Motivation Effects 
of Auditory Timing Upon Repetitive Mental Work.’’ Brit. J. of Psychol., 
1934, XXV, 140-169. 

2 Published by Department of Psychology, New York University. 


528 











AUTOMATIC MENTAL PERFORMANCE 529 


task. This test consists of addition problems with totals of 
less than nine in which the problems are distributed in chance 
order and arranged in rows of thirty, and where the answers 
are written below the problems. Instructions providing for a 
uniform intent to work at a comfortable pace were read indi- 
vidually by the workers before each period of work, as follows: 


Comfortable Work Instructions 


You are asked to do this work (which is the addition of long 
rows of problems with totals of less than nine) with the atti- 
tude that you are expected to do it for weeks and weeks, per- 
haps years. Under these conditions you would not race, of 
course, nor would you loaf. You would work so that there is 
relaxation, ease, and comfort, just as in any other work you do. 
We will call this a comfortable pace, and you will be asked at 
the conclusion of each short test to state ‘‘yes’’ or ‘‘no’’ in 
answer to the question: ‘‘Did you follow the instruction?’’ 

The tests will be of varying lengths: possibly one minute, 
possibly one and one-half minutes, possibly three-quarters of 
aminute. Pay no attention to this, but start with the bell and 
do not add any problems after you hear the bell again as the 
signal to stop. 


It is noted in the instructions that one introspective check 
is present: The subject writes ‘‘yes’’ or ‘‘no’’ if he followed 
instructions. 

The work was divided into five periods and performed at the 
same hour and day of the week for five successive weeks. 
There were ten tests for each period of work, which were timed 
differently to control any motivating effect of ‘‘knowledge of 
results.’” The time limit of each test was set at about one 
minute. The exact ‘‘times’’ of the ten tests in order follows: 
(1) 90 seconds; (2) 60 seconds; (3) 50 seconds; (4) 60 sec- 
onds; (5) 70 seconds; (6) 60 seconds; (7) 50 seconds; (8) 60 
seconds; (9) 70 seconds; (10) 60 seconds. These tests fol- 
lowed each other as rapidly as possible, that is, with a few 
seconds pause between them to enable the subject to write 
**ves’’ or ‘‘no’’ and to get set for the next test. This was done 
to further control any competitive influences and to make the 








530 DOUGLAS FRYER 


day’s observation of ten tests as much of a unit as possible. 
Sixteen male college students (sophomores and juniors) in the 
University College, New York University, served as the sub- 
jects. These students volunteered their time for the experi- 
ment and performed the task as a group with the individual 
subjects seated at some distance from each other in a college 
classroom. They were without any knowledge of the purpose 
of the experiment. 


Il, THE RESULTS 


Scores for each test have been brought into terms of number 
of problems accomplished per minute, as the time limits for 
the various tests were different, with the nearest whole number 
taken as the result. Scores for Test 1 and for Test 10 were 
disearded for each period of work to remove any possible effect 
of ‘‘warming-up”’ or ‘‘final spurt.’’ When a subject placed 
‘*no’’ opposite his test in answer to the question ‘‘Did you fol- 
low instructions,’’ the score in this test was eliminated from 
the results used in the experiment. These results were then 
converted into per cent of problems per minute in Test 2 for 
each period of work, which are the results reported here. 


The results, computed in terms of problems per minute, indi- 
eate that variability between subjects in average rate of per- 
formance for a period of work is sufficiently great to obscure 
the amount of individual variability in any one period of work 
when averaged for all subjects. There are S.Ds. of the distri- 
butions of the subjects’ mean scores of 11.42, 10.58, 12.25, 
12.62, and 15.15 problems per minute, respectively, in the five 
periods of work, where the means are 91.75, 97.63, 96.69, 
100.25, and 105.67 respectively. The measures of variability 
which we are seeking, that is the average of the 8.Ds. of all 
subjects for the tests of a period of work, which are 5.07, 3.30, 
3.34, 3.65 and 5.26 problems per minute, respectively, for the 
five periods of work, are necessarily influenced by the wide 
differences in individual level of performance for the various 
subjects and it is necessary in order to eliminate this influence 
that individual level of performance be given some common 
denominator. Also, the subjects seem to choose varying levels 
of comfortable work on different days as is indicated by a mean 
of the S.Ds. of 5.45 problems per minute for the distributions 








AUTOMATIC MENTAL PERFORMANCE 531 


of the means of the individual subjects in the different periods 
of work (computed for twelve subjects for four periods of 
work). To give both the individual subjects and the different 
periods of work a common denominator, the results in ‘‘prob- 
lems per minute’’ are brought into terms of ‘‘per cent of prob- 
lems per minute in Test 2 for each period of work.’’ Thus 
converted they are shown in Tables I and II, where all subjects 
have a common level of performance in Test 2 of each period 
of work as the basis for indicating variability in rate over a 
period of time in which the subjects intend to work at a com- 
fortable rate. 


The results in Table I compare the subjects’ means for the 
first half (Tests 2 to 5) and last half (Tests 6 to 9) of each 
period of work. These are averaged in the bottom rows of the 
table for all subjects and for twelve subjects (A to L incl.) who 
worked throughout four periods. §.Ds. for the distributions 
of the subjects’ means show the variability of subjects in level 
of performance. 

The rate of the initial experimental test (Test 2) is main- 
tained most consistently throughout the eight tests in Period 
IV, where the average of the subjects’ means for Tests 2 to 5 
is 100.32 per cent; for Tests 6 to 9 it is 101.15 per cent; and 
for Tests 2 to 9 it is 100.84 per cent. The greatest increase 
in uniformity of performance, as indicated in the comparisons 
of the mean performances of the first and last half of the tests, 
is noted between Periods I and II. The averaged mean scores 
for Periods II, III, and IV (Subjects A to L inclusive) for the 
first half of the tests (Tests 2 to 5) is 100.71 per cent; for the 
second half of the tests (Tests 6 to 9) it is 101.75 per cent; and 
for the whole series of tests (Tests 2 to 9) this average is 101.27 
per cent. The mean rate of performance increases very 
slightly indeed for all subjects in a period of work. 

But variability between subjects in mean performance is 
relatively high in the latter tests when compared with the 
earlier tests of a period of work, as shown by the S.Ds. of the 
distributions of mean scores in the bottom rows of Table 
I, which indicates that individual performance tends to diverge 
more and more from the common starting point of 100 per cent 








20'9 £0°6 , "Put T-V 
Bg°S0T 22801 uweeK | syoefqng 


o's | 6e'e £8°8 : s00fqng 
SP'F0T 08°L0T Iv 


30°68 | 68°96 8st 
O'OOT | LF00T | O00T 980T 
SOIl | SPST Lo86 | 89°46 
89° TOL TROL | 8SSO0T | OBOT | SSOL |, TOIT 
91°66 , T8996 | LL°¥6 “sot | OOIT rS'86 | SHS 
GL°86 : LT 66 0°00T 6330 | 9 00T O6IT 
Sv'L6 : eL00T | SOOT | SOL | S6TOL | MOOT | TOL | S901 
S3°86 OS60T | 990T | FSOT | OOIT | 8 00T 96° TOT 
S301 J SSOOT | FF'96 L60T | sit | T9OT | Ss0l | 8eFOT 
O's0T PS6 | 80°96 9 00T | 88°00T | FOOT Sot 
¢°e0t ; see6 | 19°46 f00T | L226 830 | S90L | 8°0IT 
Feor OS'00T | O'BOT T6IT | 3601 O'SzT 






































8°00T 6600T | L001 LL30T | S80 | SPOT) PFOIT 
ce's6 i 61°16 9 TOT | SOOT | S900T | FOOT 9°08T 
TTT TZ'96 | 96°96 €90T | S40T | SOT LT POT 
eo°86 | LT ¥6 Svsé | 8STS6 | 38°66 | HSOOT | 61°86 | SFE | GS LE | 81°36 


























s-3 6-9 3-3 6-3 6-9 o-3 Os 6-9 s-3 


AI GOradd 40 SLSAL III GOruad AO SLSAL II GOIaad 40 SISAL I GOraad 40 SLSGL 


"(8 w80L) 380] popuowwodag 
qourur fo yuep 40g fo swsol wy 4404 Jo sporseg pun sjoofqng 40 oul 6-¢ 8180, puv “jouT 6-9 8180] “your 9-¢ 83807 40f SNV AH fo uosuvdwog 


I Widvi 



































AUTOMATIC MENTAL PERFORMANCE 533 


as the work progresses. Individual variability in mean per- 
formance decreases progressively from the first to the fourth 
periods of work. The greatest drop in variability of mean 
performance is between Periods I and II and averages of S.Ds. 
of the distributions of mean scores for twelve subjects (A to 
L incl.) in Periods II, III, and IV are 2.87 for the tests in the 
first half of the periods; 5.52 for the tests in the last half of 
the periods; and 3.95 for the whole series of tests. This is low 
variability between subjects in performance of the kind used 
as the task in this experiment, which would indicate that the 
sample is sufficiently homogeneous to be representative of indi- 
viduals working at about 100 problems per minute (with an 
S.D. of the distribution of about 11 problems per minute). 

Increased uniformity of performance from period to period 
is indicated also by the numbers of individuals increasing their 
rate in the second over the first half of the tests in the different 
periods of work, which is shown below in per cents for all sub- 
jects, and for Subjects A to L inclusive, in the various periods 
of work : 





Periods All Subjects Subjects A to L inel. 
Per Cent Per Cent 


88 
81 
46 
42 
50 











While nearly all subjects increased their rate of performance 
in the last half of the first period of work there is a progressive 
tendency for as many subjects to decrease as to increase rate 
in later periods of work and the number stabilize at about fifty 
per cent in Periods III and IV. 
Table II shows a comparison of the S.Ds. of the individual 
subjects’ distributions, the means for which were given in 
Table I. S.Ds. are given by subjects for the tests in the first 
and last half of each period of work and for the total period of 


















































KN tae set PST SPT 8s'0 tT T 0LT 8s T 96T 8a T 88'T SLT 86T L0% ‘cs { ‘PU I-V 
ae Site. pee 08°€ 0g" 1g°3 sre 68°3 Le" 61° Ts ors 90°9 sry 98°F ween | syoefqng 
pls 00°T est PST SFT 8s°0 PITT L9T 83T 88°T eT gst L6OT 80% 98°S ‘cs 1 syoofqug 
68° i tre 08"¢ LS 8 19° Svs 66% re 38°s £63 92°E 98°9 8TP oF bled | 
pwawe sip = sansa RS geoceathn rae ae . omy Pe'F Z0'T 89°2 zoot | os'z Itt | — ¥ 
conan ee: 2 aig tees Rect oe lead Sain onows 0¢'T ZT OL'T 89'S 69°S re'L Se 
anv cms... Sinton alien elle =< Ecole aint i mane die 66'S Le 08'9 98°L aUZ 1L'2 ning had N 
ee apie eel cine sania Msi SI's Isr 6F'L 6L'E Le #8°8 26°F pated 10 | W 
Betan shel ith 40R 9 30'S L8°e 99°¢ LLe 83°F 883 sts TZ’"9 86°9 S3°9 Fe i oe 
se i pines omnia 00°e LL'S Z6'L IS'L 06'T LL'0 es'T 0r2 660 o'L 19°8 20°F cn oF 
eee 2 a 08"¢ 6r'F res ere Le3 BaF 93°3 663 soo 6e°¢ PrP 80°F SES 
a " ut. ep 98°F 18°€ TéT +09 6£°9 98'S se"9 gts Lve es"¢ L6% 6rP 2 eae 
nS pee pit 062 g2°¢ Ts LT& Sot ¥8°S ees 98°F 63°F 80° ae'F ser _ =e H 
pit i pian se bls S03 6Ts 8's s6T eee 9LT Trt LOT Shs 633 PET a 
eg’s L9°S ors PES sos 96T 18°? Ad 83h 6F'S gL’ 61° 68°L L468 re ee ee 
SOTT | Ors teh 62°F 00° 1g°3 ser pls TéT 99°L 80°s T¥'9 83'°8 TS 93°L pe 
g9°s Tos 1sF Les 98°T 49% 99° 9S°3 62°3 83 6L°3 80S 86S 89°S TT oa 
SLs Lae Us T 10°3 6s°0 80% 68S POT S63 6TT LeT *6°0 sts rTP 88°9 ae 
08"¢ 16e gle £9°e rls Ses Lg°3 ort ere 6Te 083 Pee st 60°3 LT fo 
ese ere est Irs bF'S 06's Ivy 86T 66°F bt 4 SL% 8F'9 86° Se meres Vv 
6% 6-9 S-% 6-3 6-9 g-3 6-3 6-9 s-3 6-3 6-9 s-3 6-3 6-9 s-3% 
A GOIaad 40 SLSAL AI GOIWad 40 SLSAL III GOIWad 40 SLSAL II GOTWad 40 SLSAL I GOIWad 40 SL8AL 




















(8 989Z) y8aZ wuewsodrg youwy fo yuep 40g 


{0 swsazZ up ys04 fo sporseg pun syoofqng sof *jouy 6 02 g 8180 “JOU] 6 0} 9 B80 “pour ¢ 0} g 82807 40f suoyngiuynd fo ‘sq’g fo uomswdwop 


Il GIavi 


AUTOMATIC MENTAL PERFORMANCE 535 


work, and they are averaged in the bottom rows of the table 
for all subjects and for twelve subjects (A to L incl.) who 
worked throughout four periods. Variability between subjects 
in their variability in tests is indicated by the 8.Ds. of the 
distribution of subjects’ S.Ds. shown in the bottom rows of 
the table. 

Individual variability is about the same on the average for 
the tests in the first and last half of each period of work and 
it is only slightly greater for all tests than for either the first 
or the last half of the tests in Periods II, III, and IV. There 
is very little difference between any of the average measures 
of individual variability in these three periods of work. The 
averaged means of the 8.Ds. for Periods II to IV inclusive 
(Subjects A to L inclusive) are 3.01 for the tests in the first 
half of the periods, 3.11 for the tests in the last half, and 3.68 
for all tests. The averaged S.Ds. of the distributions of sub- 
jects’ S.Ds., from which these means were secured, are 1.23, 
1.49, 1.55 respectively. With variability little greater for the 
whole series of tests than for the first and last halves of the 
series, it is suggested that a further increase in number of 
tests would not greatly influence individual variability in per- 
formance. Also, the low 8.Ds. of distributions of individual 
subjects’ S.Ds. suggest that a larger sample would not affect 
the average measures of variability greatly. 

The per cent of individual subjects increasing in variability 
during the last half of each period of work is shown below for 
all subjects and for subjects A to L inclusive. 








Periods All Subjects Subjects A to L incl. 
Per Cent Per Cent 
ee SS ae 56 58 
| 44 58 
a 38 33 
Te 67 67 
We <cxcastiens 100 





Fewer subjects increase variability during the latter half of 
Period III than for any other period of work, while a slightly 








536 DOUGLAS FRYER 


greater percentage increases than decreases for the other 
periods, referring to the results for Subjects A to L inclusive. 
The average of subjects increasing in Periods II to IV inclusive 
(for Subjects A to L inclusive) is 53 per cent. 


Ill. MEASURES OF TEST-CHANGE IN PERFORMANCE 
WITH UNIFORM INTENT 


Per cent change in performance is a useful measure of 
variability for purposes of application. Table III gives aver- 
ages of such change by tests and by single and combined 
periods of work. All averages are mean test-changes either 
from Test 2 or from the immediately preceding test and they 
are given both in terms of level of change (arithmetical mean) 
and total amount of change (absolute mean). These results 
are for the twelve subjects (A to L incl.) who worked through- 
out four periods (Periods I to IV imel.). Periods II, III 
and IV are considered to be best representative of the condi- 
tions of the experiment and the following discussion will relate 
only to the results for these periods while results for all periods 
are included in the table. 


Uniformity of intent is a matter of degrees, of course. Com- 
fortable work instructions were administered to secure as 
normal performance as possible with what might be termed a 
uniform intent throughout a period of work. If uniformity 
of performance can be taken as evidence of uniformity of con- 
scious intent, then Period I may be regarded as offering 
results in which intent was not sufficiently uniform for the 
purposes of the experiment. That the first working period 
engendered a fairly variable intent in comparison with the 
later periods is evidenced in the higher per cent of tests (nine 
per cent) discarded because of the statements by the subjects 
that they did not follow instructions. (Five per cent of the 
tests were discarded from Period II, two per cent from Period 
III and zero per cent from Period IV.) 


Table III is divided into four sections, as follows: 


Sections A and B: Arithmetical Mean of Subjeets’ Test- 
changes from Test 2 and from Preceding Test. These aver- 
ages indicate the amount of increase or decrease in perform- 





- 
: 
3 





8.D. 


AUTOMATIC MENTAL PERFORMANCE 537 


BOSH te 


3959 = 42 qiases ss %9 
cml on ae | onl . fo) mal 
tT TI a 


4-2. 
—0. 
4a] 
<—0 








See 89 <t 2eQqgeq Ot er 
i o NMS na moocoo on oq 
+oit +] +] mitt+ie ++ +] 








HUOHA OY DE 


So 
HAWS Ha ow 
AwMnNnrns WHO wo 
rei 


eliwococo FAN oS 
+1i+ + 





) 





SQ rey m~ ORCS MIR WH 





IN TESTS 














TABLE Ill 








in Four Periods of Work (I to IV ¢ 





MOAR OO HA Sl Hae AeA we 
++ 1+. + + iit I i 
nD 
SEI HO Hey 5 Sere cc ot 
a~ns +05 wo Bi |_ weeds oe oe 
++++ + + = +++l + + 
-_ 
RoeyoS Se BO ACA WNrey Io 
CMa ano oa Sicsad nwo oa 
++rit+ + + ++ 1+ + + 
ase St rig 2Eory om CA 
Ans a Or Oidgnaa xa on 
i+ i i 1 ! ee | i 1 
eSoee SH th SAY 2H Or 
AVY HA oa Tiemann KA RG 
++i+ + + trite 3 1 











0 
Cot Am mo 
Aa ac 


ve) 
NQSot Arm mo 
+etoocoo an ao 
++++ 








SECTION A. TEST-CHANGE FROM TEST 2 (ARITHMETICAL MEAN OF SUBJECTS) 


PERIODS 
OF WORK 








Average Test-change in Per Cent of Problems Per Minute By Tests (2 to 9 incl.) For Twelve Subjects (A to L inel.) 





SECTION B, TEST-CHANGE FROM PRECEDING TEST (ARITHMETICAL MEAN OF SUBJECTS) 





CC 
on 























re |) dg ds 
Sa aa as | i Sa Se 
a ei iiiine 
2 Z| [ft l+ree F 3 
Be Be 








DOUGLAS FRYER 






































TABLD III—(Continued) 








TEST-CHANGE FROM TEST 2 (ABSOLUTE MEAN OF SUBJECTS) 








SECTION C. 





















































qQieten aa ew QA/etae tay oO 
- |\NxHOnr : onoo 
. es ‘ TW 
sss S43 Sea SHY DO HH 
Bis i wi a a iG + wi wi “ a 
~ 
Sees 18 a9 S544 4-1 98a 
SCs wes we F NII TRH +S 
2 
fe 
) 
Hh 9m on] z CNShy BH On 
SiG wis oa oo! g Stad ti wc 
a 
a 
m 5 n 
& Sot DO Cr sig aee~ Hy CH 
> Sev Ca wei 5 a woot BS BS 
z = /z 
— Lad 
az 
ROA tim OM] F YO I wey 
OOWG Cri WS! Ovid ws wo 
: 
~Se" S44 S¢ Sess 4a GH 
Bowes wa woo . WAS te Te 
Pay 
Seen =8 3h] 2 seer Sh OE 
Oto tri ws 5 Ws 4S WO 
F 
Bee Re oa] © VIE BH Oo 
WOW WH Bel - WCBS We wR 
z 
3 
; 
rid 2 
Ta i 
iii 2 
ae Ba a ae 
BS [eee aS | ee 
See Fee 
~ ~ 














AUTOMATIC MENTAL PERFORMANCE 539 


ance of the group in relation to its starting rate (in Test 2) 
and in relation to the immediately preceding test. The test- 
changes in the body of Sections A and B of the table were com- 
puted by averaging the individual subjects’ changes in per 
cent of performance in Test 2 either as a change from Test 2 
or from the preceding test. 

Sections C and D: Absolute Mean of Subjects’ Test-changes 
from Test 2 and from Preceding Test. These averages are the 
total amount of change for the group, whether increases or 
decreases, in relation to the starting rate and in relation to the 
immediately preceding test. The test-changes in the body of 
Sections C and D of the table are computed by averaging the 
individual subjects’ changes in per cent of performance in 
Test 2, without regard to signs, either as a change from Test 2 
or from the preceding test. 


Average test-changes from Test 2 are usually increases in 
performance while there are numerous decreases. The aver- 
age increase in performance for the three periods, shown in 
Section A of Table ITI, is 1.4 per cent, with a S.D. of the means 
of individual test-changes for these periods of 1.1 per cent. 
(The average of the S.Ds. of individual period distributions of 
test-changes is 1.5 per cent.) Variability for the test-changes 
from period to period is greater, as indicated by an average 
of the 8.Ds. of the distribution of the same test-changes for 
the periods of 2.7 per cent. (When computed for the mean test- 
changes of the three periods this 8.D. is 2.5 per cent.) 

The average test-change from one test to the next is almost 
zero and is shown as +0.3 per cent in Section B of the table, 
with a 8.D. of the means of individual test-changes for the 
three periods of 1.7 per cent. (The average of the S.Ds. of indi- 
vidual period distributions of test-changes in 2.3 per cent.) 
Variability for the test-changes from period to period is indi- 
cated by the average of S.Ds. of the distributions of the same 
test-changes of 2.1 per cent. (The S.D. of the distribution of 
mean test-changes of the three periods is 0.4 per cent.) 

Thus, in a repetitive, fairly automatic, mental-manual task, 
with intent to work at a comfortable rate, average change in 
performance from the starting rate is expected to increase 
1.4 + 1.1 per cent to include 68 per cent of the test-changes. 
From the preceding test, performance is expected to increase 








540 DOUGLAS FRYER 


0.3 + 1.7 per cent to include 68 per cent of the test-changes. 
Individuals will vary from these means of test-change, of 
course. Variability (S.D.) in mean performance is shown 
in Table I as increasing progressively in a period of work. 
For the distribution of individual means of all tests in a 
period of work there is a 8.D. of 4.0 per cent (average for 
Periods II, III, and IV). The expected increase in perform- 
ance from Test 2 at any test is 1.4 per cent + 4.0 per cent and 
from the preceding test, 0.3 per cent + 4.0 per cent, to include 
68 per cent of individual levels or mean rates of performance. 
Group level of performance at any one test is expected to vary 
between periods + 1.4 + 2.7 per cent when computed from the 
starting rate in different periods of work and from the preced- 
ing test + 0.3 + 2.1 per cent to include 68 per cent of the period 
differences. This much can be said concerning level of per- 
formance for both individuals and groups under the conditions 
of the experiment. 

Absolute average test-change in performance is indicated in 
Sections C and D of Table III, where it is computed both from 
the starting rate in Test 2 and from the preceding test. The 
absolute average change from the starting rate is 4.8 per cent 
and from the preceding test it is 4.5 per cent. The 8.Ds. of 
the distributions of mean test-changes for the three periods 
are 0.7 and 0.6 per cents, respectively (The mean of the S.Ds. 
of individual period test-changes are 1.0 and 1.0 per cents, 
respectively). Variability between periods is low with an 
average of the S.Ds. of individual test-changes for the three 
periods of 0.9 per cent and 0.8 per cent when change is com- 
puted from Test 2 and the preceding test, respectively (The 
8.Ds. of the distributions of the means of test-changes in indi- 
vidual periods are 0.6 and 0.06 per cents, respectively for the 
two conditions). 

Thus, the absolute amount of change in performance to be 
expected at any one time is set rather exactly at five per cent. 
Expected change in absolute amount from the starting rate is 
4.8 + 0.7 per cent to include 68 per cent of test-changes ; from 
the preceding test it is 45+ 0.6 per cent. The amount of 
individual variability in these test-changes is indicated in 











AUTOMATIC MENTAL PERFORMANCE 541 


Table II where the S.D. for the distribution of individual 
8.Ds. for all tests (averaged for Periods II, III and IV) is 
1.55 per cent. Hence, expected total change from the initial 
test at any one test is 4.8 + 1.6 per cent to include 68 per cent 
of the individual workers; expected total change from the 
preceding test is 4.5 + 1.6 per cent. Expected total change at 
any one test when 68 per cent of period differences are in- 
cluded is 4.8 + 0.9 per cent, computed from the starting rate, 
and 4.5 + 0.8 per cent, computed from the preceding test. 


IV. A CHECK EXPERIMENT 


The author has reported elsewhere* an experiment in which 
exactly the same conditions were present as in this experiment, 
with additional instructions administered as follows: 


The first test will be a period of choosing the pace. After 
the first test you are not allowed to change in speed or rate of 


adding. 

There were six subjects, each of whom had performed the 
task from 78 minutes to 786 minutes of previous working time; 
hence they were as familiar with the task as the subjects in this 
experiment during the later periods of work. Results are for 
four consecutive tests, following a preliminary trial, and the 
mean of the 8.Ds. for the six subjects is 2.35 per cent of prob- 
lems per minute as compared with an averaged S.D. for the 
same tests in Periods II to IV inclusive for the twelve subjects 
(A to L inclusive) of this experiment (Table II) of 3.00 per 
eent. The average rate for the check experiment was 95.2 
problems per minute while in this experiment it was 98.7 prob- 
lems per minute. 

Table IV shows the average test-changes in per cent of prob- 
lems per minute for the check experiment in relation to this 
experiment where the twelve observers are averaged for the 
same tests (Tests 2 to 5 inclusive) and for three periods of 
work (Periods II to IV inclusive). The absolute averages for 
the six subjects in the check experiment are about one per cent 
lower than for the observers in this experiment. 


8 ‘* Specific Conscious Intent and Its Correlates in Performance,’’ Brit. 
J. of Psychol., 1937, XX VII, 364-393. 








542 DOUGLAS FRYER 


TABLE IV 
Average Test-change in Per Cent of Problems Per Minute in Check 
Ezperiment Compcred with This Experiment (Same Tests for 
Twelve Subjects, A-L incl., Averaged for Periods II to IV 
inclusive). S8.Ds. of Distributions of Subjects’ Aver- 
aged Changes Shown in Parenthesis 

















SIX SUB- FIVE SUB- 

TIONS AVERAGES expeniuenr | JECTS CHECK |-JECTS CHECK 

A Arith. Ave. Change 
from Test 2 cu + 0.65 +0.51(3.69) | -0.82(2.38) 

B Arith. Ave. Change 
from Preceding Test... - 0.07 +0.56(1.94) | - 0.12(1.33) 

C Absolute Ave. Change 
from Test 2o.cccccmmnn 4.2 3.28(1.84) | 2.51(0.65) 

D Absolute Ave. Change 
from Preceding Test... 4.4 3.07 (2.96) 1.76(0.20) 





Five of the observers in the check experiment offer results 
from what evidently constitutes a highly homogeneous group 
in such work as is involved in this experiment and their results 
are shown separately in the last column of Table IV. Their 
absolute average change from their starting rate is only 2.51 
per cent and from the preceding test it is 1.76 per cent, and the 
8.Ds. for the distributions of test-changes are 0.65 and 0.20 per 
cents, respectively. The amount of individual variability is 
indicated for this group, as in Table II for the major group, 
by a 8.D. of the distribution of individual workers’ 8.Ds. of 
0.32 per cent. Hence, absolute average change in performance 
from the starting rate is not more than 2.5 + 0.7 for such a 
group, and from the preceding test more than 1.8 + 0.2 per 
eent, to include 68 per cent of the test-changes. Individual 
change from the starting rate is not more than 2.5 + 0.3 per 
cent at any one test, and from the preceding test more than 
1.8 + 0.3 per cent, to include 68 per cent of the cases. 


Vv. SUMMARY AND CONCLUSIONS 


The purpose of the investigation reported here was to 
measure the amount of change in performance in a repetitive 


- tte pen a Ce eee ee. ae ee eee 


pon ine a 7 a> a" 








AUTOMATIC MENTAL PERFORMANCE 543 


mental-manual task where there was maintained throughout 
the period of work a uniform conscious intent to work at a 
comfortable rate. The experiment was arranged to eliminate 
effects of practice, fatigue, initial and end-spurts. Results 
were secured for college students in five periods of work includ- 
ing ten tests of approximately one minute each and the per- 
formance of twelve subjects during the three middle periods 
(Periods II, III, and IV) represents best the conditions of the 
experiment. Results are summaries for these subjects and 
periods of work in terms of per cent of working rate in Test 2 
for each period with the average rate of 98.69 problems per 
minute. 

1. Average rate of performance for the group changes very 
little during a period of work under the conditions of this 
experiment where uniform intent is maintained to work at a 
comfortable rate (see Table I). The average working rate in 
the first half of a period of work is 100.71 per cent of the begin- 
ning rate, in the last half it is 101.75 per cent, and for the whole 
series of tests it is 101.27 per cent. Fifty-eight per cent of the 
individual subjects increased their rate during the last half of 
a period of work. The average change for each test from the 
initial test in a period of work (see Table III) is an increase 
of 1.4 per cent with a S.D. of the distribution of test-changes 
during the period of 1.1 per cent. The average increase for 
each test from the immediately preceding test is only 0.3 per 
cent with a S.D. of the distribution of test-changes during the 
period of 1.7 per cent. Period differences (see Table III) for 
the same test-change from the initial test in each period are 
indicated by a mean of the S.Ds. for test-changes in different 
periods of 2.7 per cent and from the preceding test by a mean 
of the S.Ds. of 2.1 per cent. Variability (S.D.) in individual 
mean performance (see Table I) increases from 2.9 per cent in 
the first half of the tests of a period of work to 5.5 per cent in 
the last half, and for all tests it is 4.0 per cent. Hence, average 
test-change from the initial test in a period of work is + 1.4 
+ 1.1 per cent to include 68 per cent of the test-changes; it is 
+1.4+ 4.0 per cent at any one test to include 68 per cent of 








544 DOUGLAS FRYER 


the differences in individual level of performance; it is + 1.4 
+ 2.7 per cent to include 68 per cent of differences for the same 
test-changes in different periods of work. From the immedi- 
ately preceding test, the average test-change is + 0.3 + 1.7 per 
cent to include 68 per cent of the test-changes in a period of 
work; + 0.3 + 4.0 per cent at any one test to include 68 per 
cent of the mean levels of individuals; and + 0.3 + 2.1 per cent 
to include 68 per cent of differences for the same test-changes 
in the various periods of work. 

Level of performance for a group of workers is expected to 
change from its starting rate at any time in a period of work 
between about zero and + 2.5 per cent and from an immediately 
preceding test between — 1.5 per cent and + 2.0 per cent. Indi- 
vidual workers will extend this range of average performance 
about 4 per cent in either direction. This will be the usual 
situation in repetitive mental-manual work, such as was used 
in this experiment, with the intent of the worker to maintain 
a comfortable pace. 

2. Individual variability throughout a period of work, with 
which this experiment was primarily concerned, is very low in- 
deed when the worker is working with intent to maintain a 
comfortable pace (see Table II). Average S.D. variability in 
the first half of a period of work is 3.0 per cent and for the 
last half it is 3.1 per cent, with little increase for the whole 
series of tests in a period of work for which the average S.D. 
is 3.7 per cent. Only with fifty-three per cent of the subjects 
is individual variability increased in the last half of a period 
of work. The absolute average of the changes for each test 
from the initial test in a period of work (see Table III) is 4.8 
per cent with a 8.D. of distribution of test-changes of 0.7 per 
cent. The absolute average change for each test from the im- 
mediately preceding test is 4.5 with a S8.D. of 0.6 per cent for 
the distribution of test-changes during the period. Period 
differences (see Table III) for the same test-change from the 
initial test in each period of work are indicated by a mean of 
the S.Ds. for all test-changes of 0.9 per cent and from the pre- 
ceding test by a mean of the 8.Ds. of 0.8 per cent. Individual 














AUTOMATIC MENTAL PERFORMANCE 545 


variability in test-change is indicated by the S.Ds. of the dis- 
tributions of individual 8.Ds. for tests in a period of work (see 
Table II) and for the first half of the tests this S.D. is 1.2 per 
cent, the last half 1.5 per cent, and for the whole series of tests 
in a period of work it is 1.6 per cent. Hence, absolute test- 
change from the initial test in a period of work is 4.8 + 0.7 per 
cent to include 68 per cent of the test-changes; it is 4.8 -+ 1.6 
per cent at any one test to include 68 per cent of the individ- 
uals ; and it is 4.8 per cent + 0.9 per cent to include 68 per cent 
of the period differences for the same test-change. From the 
immediately preceding test, absolute test-change is 4.5 + 0.6 
per cent to include 68. per cent of the test-changes in a period 
of work; 4.5 + 1.6 per cent at any one test to include 68 per 
cent of the cases; and 4.5 + 0.8 per cent to include 68 per cent 
of the period differences for the same test-change. 

Individuals can be expected to change in rate of performance 
from the preceding test or from the initial test in a period of 
work between 2.5 and 7.0 per cent and the same test-change in 
different periods of work will not vary more than 1.0 per cent 
from this. These figures are for individual changes in per- 
formance when the worker intends to maintain a comfortable 
rate. 

3. For a highly trained group of workers intending to main- 
tain their rate of work without change while working at a com- 
fortable pace, the absolute average change from the initial test 
is 2.5 per cent and from the immediately preceding test 1.8 per 
cent, with 8.Ds. of the distributions of test-changes of 0.7 and 
0.2 per cents, respectively (see Table IV). Individual vari- 
ability is indicated by a S.D. of the distribution of individual 
S.Ds. for tests in a period of work of 0.3 per cent. 

Such individual workers, intending not to change their rate 
of comfortable work, will change between 1.5 and 3.5 per cent 
from their starting rate and between 1.3 and 2.3 per cent from 
the immediately preceding test. Absolute average change for 
them is about 2 per cent as compared with 5 per cent for the 
major group in this experiment intending only to maintain a 
comfortable pace. 





THE SUCCESS OF SIXTY SUBJECTS IN 
ATTEMPTING TO RECOGNIZE 
THEIR HANDWRITING 


ORAN W. EAGLESON 
Spelman College 


PURPOSE 


HIS is the report of a short study that was made for the 

purpose of securing data on the success of sixty Negro 

college girls in attempting to recognize specimens of their 
handwriting. The subjects were mainly of sophomore stand- 
ing. Their average age was 19.5 years. 


PROCEDURE 


In order to secure the six specimens that were used in the 
study, the subjects were furnished with new freshly sharpened 
pencils of the same brand and quality of lead and indexing 
eards 4” by 5” and ruled on only one side. The instructions 
were the same for all the subjects. On no card was the subject 
permitted to erase or cross out words or make extra marks of 
any kind. The purpose of the study was not divulged before 
or during the preparation of the specimens. 

The first specimen was made by writing in the usual style a 
dictated prose passage of forty-two words on the ruled side of 
one card. The amount of indentation and the line on which 
to begin were specified, namely one-fourth inch in from the left 
and the third line down from the top. The second specimen 
differed only in that it was written on the unruled side of a 
card. 

The third specimen was made on the ruled side of a card by 
copying from the blackboard in the usual manner of writing 
the verse ‘‘ Twinkle, twinkle little star.’’ The fourth specimen 

546 





f 
1 
t 
s 
( 
t 
1 
1 
’ 
1 
1 


The Ni 





NO. ¢ 
PRESE! 
TION 





First 
Second 
Third 





RECOGNIZING HANDWRITING 547 


differed only in that it was copied on the unruled side of a 
card. 

The fifth specimen was made by writing in the usual manner 
the alphabet in separate small letters on the third line from 
the top beginning one-fourth inch in from the left. 

The sixth specimen was made by having the subjects make 
the numbers from one through nine and zero as they usually 
do. The numbers were arranged on the ruled side of a card 
in two uniform columns with a line between each number. 

After the specimens had been made, the subjects’ names 
were removed from each card by cutting one-fourth inch from 
the top. Each card was numbered and the name and number 
were recorded to enable identification in the future. 

The da‘a were obtained from each subject individually. Be- 
fore the cards were presented, the subject was questioned as to 
whether or not she thought she could recognize the specimens 
that she had written a week before. Then, the cards were 
shown for a particular specimen one at a time at the rate of 
one card every ten seconds until the subject selected a card as 
the one which she had prepared. The number of the card was 
recorded without the subject’s knowing whether or not it was 
the right one. Then, out of the sight of the subject, the card 
was mixed with the others for a second and third selection. At 
the end of the sitting the subject was questioned as to factors 
that aided in making the selections. 


RESULTS 
Table I shows that well over fifty per cent of the subjects 


TABLE I 
The Number and Per Cent of the Subjects Who Were Successful in Recognizing Their 


Cards for Each of the Three Presentations 


























PROSE- PROSE- POETRY POETRY 
NO. OF | “ LETTERS NUMBERS 
paneniie RULED UNRULED RULED UNRULED 
a No. | % | No. | % | No. | % No. | % | No. | % | No. | % 
First ......... | 47 | 78.3 | 48 | 80.0 | 45 | 75.0 | 57 | 95.0 | 39 | 65.0 | 24 | 40.0 
Second ....... | 49 | 81.6 | 51 | 85.0 | 49 | 81.6 | 54 | 90.0 | 47 | 78.3 | 26 | 43.3 
Third ........... 55 | 91.6 | 50 | 83.3 | 52 | 86.6 | 56 | 93.3 | 48 | 80.0 | 26 | 43.3 












































548 ORAN W. EAGLESON 


were able to identify their prose, poetry and letters, but less 
than fifty per cent succeeded in recognizing their numbers in 
any of the three attempts. 

Table II shows that well over half of the subjects recognized 
their prose, poetry and letters in each of the three presenta- 
tions without a single mistake or omission, while well under 
half recognized their numbers. 

TABLE II 


The Number and Per Cent of the Subjects Who Were Successful in All 
Three Presentations for a Particular Specimen 





PROSE- PROSE- POETRY POETRY 
RULED UNRULED RULED UNRULED LETTERS NUMBERS 


No. | %o | No. | % | No. | %o | No. | % | No. | % | No. | % 
44 | 73.3 | 44 | 73.3 | 45 | 75.0 | 52 | 86.6 | 38 | 63.3 | 23 | 38.3 


























Table III shows that forty-five per cent of the subjects were 
mistaken in all three attempts to select their numbers, while 
the range of mistaken selections for the other five specimens 
was from thirteen per cent down to three per cent. 

TABLE III 


The Number and Per Cent of the Subjects Who Were Mistaken in All 
Three Presentations for a Particular Specimen 





PROSE- PROSE- POETRY POETRY 
RULED UNRULED RULED UNRULED | “STTERS | ens 


No. | %o | No. | Jo | No. | %o | No. | % | No. | Jo | No. | Jo 
5 | 83 | 5 | 83 | 8 | 133 | 2 | 33 | 6 | 10.0 | 27 | 45.0 























_ Six (10 per cent) of the sixty subjects recognized their work 
for all six specimens in each of the three presentations. Thir- 
teen (21 per cent) of the subjects identified all of the speci- 
mens except the numbers in each of the three presentations. 
Twenty-one (35 per cent) subjects were successful in selecting 
their prose and poetry each time, but failed to recognize either 
their letters or numbers during the presentations. Only one 
subject failed to recognize any of her work in any of the 
presentations of the six specimens. 

















RECOGNIZING HANDWRITING 549 


Table IV shows that well over half of the subjects were con- 
fident they could recognize their prose and poetry specimens, 
but less than half were as confident that they could identify 
their numbers. 


TABLE IV 
The Confidence of the Subjects Before Seeing the Cards 














CONFIDENCE | PROSE AND POETRY LETTERS NUMBERS 
No. % No. % No. %o 
en 50 83.3 41 68.3 26 43.3 
Could Not ..... 1 1.7 0 0 8 14.3 
Doubtful ......... 9 15.0 19 31.7 26 43.3 








Each of the sixty subjects—both the successful and the un- 
successful ones—gave the ‘‘ general appearance’’ of their writ- 
ing as the factor which enabled them to make their selections. 
More specific questioning revealed that they considered the 
slant, heaviness or lightness of the writing, and the formation 
of certain letters as the principal factors that influenced their 
selections. 

CONCLUSIONS 


This study does not permit the drawing of general conclu- 
sions, but the data suggest that there may be many people who 
cannot identify their handwriting, more who cannot recognize 
their letters, and more still who are unable to recognize their 
numbers. There is also the suggestion that a larger number of 
people believe they can recognize their handwriting than really 
can. It seems that the shape of the letters, the slant of the 
writing, and the heaviness or lightness of the writing serve as 
the more important factors of enabling selections on the basis 
of what the subjects call ‘‘general appearance.’’ 











WIDE RANGE MULTIPLE CHOICE 
VOCABULARY TESTS 


C. R. ATWELL anp F. L. WELLS 
Boston Psychopathic Hospital 


INCE its introduction by Terman in 1916, the vocabulary 
test has steadily grown in importance for mental measure- 
ment, as its meaning has become better understood. Here 

are offered two tests generally suitable for the ordinary pur- 
poses and levels to which vocabulary tests are adapted. In edu- 
cational terms, these range from third grade through college. 
Perfect scores in the present tests are so rare as to be not yet 
observed. There is also some discussion of principles that un- 
derlie the construction of vocabulary tests, and hints are given 
regarding their use in practise. 

As part of an eclectic series of tests (‘‘Intelligence Com- 
posite’’ used in studies of deterioration) it was desired to con- 
struct multiple choice vocabulary tests having comparatively 
wide range, such as that in the Stanford revision of the Binet 
series. The word list there used was so arranged, with five 
possible choices for response. Definitions and misleads were 
selected on the basis of experience with the Stanford test itself. 
This and other series which were constructed for the above pur- 
pose are named by the first word in them. Thus the one using 
the Stanford words is termed the ‘‘orange’’ list. Another test 
(‘‘saucer’’ list) was constructed, in which the hundred words 
were selected by chance (position on page) from a 20,000 word 
dictionary (Webster’s Primary School) ; the original misleads 
were here necessarily the authors’ estimates as to what words 
would prove misleading. 

While no pattern of responses to choose from was invariably 
followed, in general the responses to each stimulus word were 


550 











VOCABULARY TESTS 551 


meant to consist of one definition (the correct response), one 
opposite, two ‘‘mislead’’ and one absurdity (occasionally the 
‘‘absurdity’’ proved an attractive mislead). Care was used 
that the correct response at least should be a word of less diffi- 
culty than the stimulus word ; and that the misleads should not 
be too close to the exact definition. 

The position of the correct response among the five choices 
was determined by chance, dealing out playing cards of the 
four suits in the 1 to 5 denominations, so that each position is 
occupied by the correct response an equal number of times, with 
a possibility of the same position for eight consecutive items. 

For criticism, preliminary drafts of these tests were given to 
several individuals and to a few small groups. Some minor 
changes were indicated, where misleads proved too misleading. 
‘*Bird’’ for example as a mislead for ‘‘canard’’ proved objec- 
tionable because of the comparatively common use in restau- 
rants of ‘‘canard’’ for ‘‘duck.’’ ‘‘Preacher’’ as a mislead for 
‘*minister’’ depended more upon a misreading of the stimulus 
word than upon knowing the meaning of the word. 

After revision, the tests were given to various scholastic 
groups with a view to norms. Approximately 1400 cases were 
used ; these came from the third, fourth, fifth, and six grades, 
the four years of high school, and the junior and senior classes 
of college. Consistent differences in medians and distributions 
were found between the classes. Comparison by age alone 
yielded less well marked differences. Persons who were re- 
tarded or advanced in their classes proved an upsetting factor 
in determining ‘‘age’’ norms. Distribution curves were ac- 
cordingly plotted for groups composed of persons of average 
age for their class in school. Medians of these distributions 
showed sufficient tendency towards a straight line to justify 
drawing a representative curve for the interpretation of norms. 
Medians obtained on the ‘‘saucer’’ list conform more closely to 
the curve obtained in this way than do those obtained from the 
‘‘orange’’ list. Because of this a further test, the ‘‘alleviate’’ 
list, has been constructed to serve as a more satisfactory alter- 








552 Cc. R. ATWELL AND F. L. WELLS 


nate for the ‘‘saucer’’ list. It is often desirable that words in 
a test should be in order of difficulty rather than alphabetical* 
but this is only partly possible since order of difficulty varies 
with the maturity of subjects, and other circumstances. The 
present arrangement is an order of difficulty derived from 
school grades 6, 8, 10, 11, College Junior and Senior. 

Correct responses for the given age groups have their medians 
as follows : 








“SAUCER” LIST “ORANGE” LIST STANFORD-BINET 
1429 casEs 1469 cases NORM 
Age Score Score 
Baca dici 21 17 20 
_ SOS ae 26 23 
tS iccthties 31 29 30 
BN ir scar seis 36 35 
UR i sscarsecccinen 42 41 40 
get EBA 47 47 
Be ii 52 53 50 
IN caieihiahehans 58 59 
IP icisinniadapi ‘“ 63 65 (65) 
by EPR R NEN | 68 71 
. SBS ae 73 76 (75) 
OP aphcticcesittiees 79 82 
SP als 84 88 
DB instiheiints i 89 94 





(No perfect score has yet been recorded for either list) 


The above norms compare to those of the Stanford test with 
a closeness unexpected, in view of the difference in presentation. 
Such multiple choice tests seem to measure the number of words 
somehow known rather than the degree to which they are known. 
These two functions of vocabulary tests have not been fully dis- 
tinguished. Thus the findings of Johnson O’Connor on the 


* Order of difficulty is generally to be preferred for clinical use, where 
the test can be stopped when it becomes too difficult. Where the test is 
self-administered, alphabetical order is usually preferable as it diminishes 
the sense of failure when too difficult words are encountered. It has been 
noted that subjects underestimate their errors on an alphabetica) list and 
overestimate them when the list is in order of difficulty. 








VOCABULARY TESTS 553 


relation of vocabulary (with revisions of Inglis tests) to execu- 
tive ability may well be based on a measurement of precision in 
understanding of words rather than a wide knowledge of them. 

At present the writers are experimenting with vocabulary 
tests designed to measure precision in the understanding of 
words. Such a test, having stimulus words of difficulty similar 
to those in the test previously discussed, illustrates effectively 
the differential réle of the misleads. For 134 cases, two groups, 
one of 73, one of 32 persons, and 29 individual tests, is found 
an average score of such a test 27 points lower than on the 
‘‘saucer’’ test ; 57 correct as contrasted with 84 correct. This 
precision test also takes the observer about three times as long 
to complete. 

In reference to the quality of misleads, Johnson O’Connor 
comments on the differential effectiveness of misleads, which 
difference he desires to eliminate (Psychometrics, pp. 247 ff.) 
but the only practical way to do this is to have none of the mis- 
leads function as such. In items where the answers are pure 
guesses the position of the misleads is a distinct factor in the 
ones which are selected. In such a selection a definite pattern 
is found, positions 1, 5, 3, 4, 2, being chosen with decreasing 
frequency in that order. For a test designed to measure pre- 
cision in the understanding of words, sufficient numbers of 
words that shade equally from a correct meaning simply do not 
exist. Further, the functions of the misleads differ according 
to culture group. Thus, ‘‘limpet’’ is marked incorrectly by 
almost the same percentage of sixth grade (67.2 per cent) and 
college students (65.1 per cent). 57 per cent of the sixth grade 
students choose ‘‘cripple’’ and 3 per cent choose ‘‘bird’’; 59 
per cent of college students gave ‘‘bird’’ and none of them 
gave ‘‘cripple.’’ As O’Connor points out (Psychometrics, 
p. 250), the person who has no idea what the stimulus word 
means has more chance of guessing the correct answer than 
does the person who has vague thoughts about the word, defini- 
tions of words sounding like it, words that a part of the stimu- 
lus might suggest, words that suggest the meaning by being its 








554 Cc. R. ATWELL AND F. L. WELLS 


opposite, words that actually convey part of the meaning. 
Under these circumstances the réle of ‘‘chance’’ in the sense 
of dice throws in determining a choice, is reduced to very small 
proportions. 

Notes on use: The advantages of the written choice feature in 
respect to objectivity and convenience, need no comment. For 
some purposes, mainly clinical, it is best to discard the multiple 
choice feature, and give qualitatively and orally as in Terman’s 
original procedure, noting carefully the phraseology of re- 
sponses. Norms are not significantly affected by this variation 
of procedure. The multiple choice procedure may also be used 
orally, where for any reason, individuals of good intelligence 
are unable to respond in writing. This is also a practical and 
convenient procedure with groups, where test forms are not 
available. This method takes considerably longer and makes 
the test slightly more difficult for most people. 

For a rapid impression every tenth word may be considered, 
with additional words at the critical point. 

For a vocabulary test as part of an ‘‘IQ,’’ every other word 
may be considered. 

For the gauging of vocabulary in relation to other mental 
functions, or for functions of personality study, the hundred 
words should be considered. 








VOCABU- aD. VOCABU- aD 

ALPHA LARY aa. ALPHA LaRY one. 

SCORE INTERP. a SCORE INTERP. ay. 

av. , av. 

195 90.0 tea 145 72.8 10.26 
190 88.3 4.41 140 711 11.20 
185 86.5 5.46 135 69.4 10.90 
180 84.9 6.88 130 67.6 11.22 
175 83.1 7.54 125 65.9 12.26 
170 81.4 7.22 120 64.2 11.02 
165 79.7 6.58 115 62.4 12.84 
160 78.0 8.86 110 60.7 11.88 
155 76.2 13.02* 105 59.0 ‘11.64 
150 74.5 8.26 100 57.2 13.48 








* Bimodal distribution, medians approximating 86 and 66, respectively. 








VOCABULARY TESTS 555 


In the comprehensive study of the individual it is desirable 
to know if the relation of the vocabulary performance to other 
psychometric functions is out of the ordinary. Figures are 
given with respect to alpha, but with considerable reservation, 
owing to the present uncertainties regarding age factors. For 
a group of 787 cases, males, ranging in age from 15 to 29 the 
relationship of scores is to be found on preceding page. 

The above table presents the figures interpreted from a 
straight-line curve drawn to represent the obtained averages 
of vocabulary scores at the various Alpha levels. The S.D.’s 
apply to the actual averages. 














AN EVALUATION OF THE FERGUSON FORM 
BOARDS AND THE DERIVATION OF NEW 
AGE AND GRADE NORMS 


PART I—PROCEDURE AND DERIVATION OF NORMS 


GRIFFITH W. WILLIAMS anp JANET LINES: 
University of Rochester 


INTRODUCTION 


HE value of form boards as adjuncts to verbal tests of 
intelligence has been demonstrated and further work on 
their construction and refinement is justified by their use- 

fulness. One of the series extensively used is the Ferguson 
Form Boards (3). This series consists of six boards presenting 
a progressive increase in difficulty. The essential character- 
istics of the series are shown in Figures 1, 2 and 3. 

Each board consists of six spaces into which are fitted blocks 
of various patterns. While each of the inserts for the first 
board consists of a single block, those for all the other boards 
are divided into two. A further complication is introduced 
in the blocks of Boards 3, 4, 5 and 6. In Boards 3, 5 and 6 
the sides of the blocks have a straight or single bevel and in 
Board 4 the blocks have a ‘‘V’’ or double bevel. These fea- 
tures are shown in the accompanying figures. 

The two preceding studies of these form boards (2, 3) may 
be criticized when judged by the criteria now current for the 

1The children used in this study were made accessible through the 
generous cooperation of the Reverend J. M. Duffy, Superintendent of 
Parochial Schools in the Diocese of Rochester; Mr. T. R. Morgan, 
Superintendent of the East Rochester Schools; Sister M. Ethelbert, 
Principal of Corpus Christi, and Miss M. Chesbro, Principal of the 
Elementary School at East Rochester. The writers gratefully acknowl- 
edge this cooperation. 

556 





THE VALUE OF FORM BOARDS 


557 














Hee yuh 


a> \&~ |8\ PA 4 




















BOARD I 















































SG) 4 
SUOQ|) hua 


GO| o~N 





OM & cp 





BOARD I 
FIGURE 


Boards I and II Together with the Placement of th 


i ‘ 
e Blocks. 





GRIFFITH W. WILLIAMS AND JANET LINES 











EX) yc | 


(ye) CC 
mcg Fk < 








6 ch 
XS 











Mf itl a Gol of 
hse. feo 











Ady dia 








4 
DB 




















BOARD 
FIGURE IL 
Boards III and IV Together with the Placement of the Blocks. 


THE VALUE OF FORM BOARDS 559 


A\| 4 
AKA) Nap 
maz) qo 
Ve oO 


BOARD Y 









































VeO 
HO ie 
JGicy} UDY 
an 


BOARD WZ 


























FIGURE Ir . 
Boards V and VI Together with the Placement of the Blocks. 








560 GRIFFITH W. WILLIAMS AND JANET LINES 


construction of tests. Several of these deficiencies may be 
briefly mentioned— 

1. The directions for administration were inadequate as 
situations arose for which no provision was made in the 
instructions. 

2. The procedure was not completely standardized as in 
neither study had provision been made for the orderly place- 
ment of the blocks. 

3. The method of scoring was so inadequately described that 
its validity may be questioned. While both experimenters 
assign values for the completion of the boards within specified 
time intervals, neither discusses the validity of these values 
and their use as measures of performance may be questioned. 

A further criticism of the method of scoring is the fact that 
neither author uses incomplete performances on a board and 
hence reduces the reliability of the series as a test. 

4. Too few subjects were used. The earlier study used 364 
subjects ranged from the first grade to the senior year in col- 
lege, a range that would permit an average of but 23 subjects 
for each grade. While the later study uses 479 subjects from 
ages 9 to 16, norms are given only for males from ages 9 to 13 
inclusive. 

5. The reliability of the test is ignored in both studies while 
the validity is considered only in the earlier and that in a man- 
ner that is difficult to interpret. 

It was in an attempt to eliminate at least some of these short- 
comings that the present restandardization was undertaken. 
In this study the following problems have been considered : 

1. The uniformity of procedure in the presentation of the 
test. 

2. The derivation of norms from an unselected and repre- 
sentative group. 

3. The determination of the validity of the test. 

4. The determination of the reliability of the test. 


SUBJECTS 


The testing was restricted to two schools which were chosen 
because they satisfied the need of securing a school population 








THE VALUE OF FORM BOARDS 561 


which was heterogeneous in its social standing. The schools 
selected were the Corpus Christi School, Rochester, N. Y., and 
the East Rochester Public School, East Rochester, N. Y. Cor- 
pus Christi is a parochial school which is so situated that some 
of its pupils come from a superior residential area and others 
from a typically poor area in a largs commercial city. The 
East Rochester School is the only school in an industrial city 
of 7,000 inhabitants. Its pupils can, therefore, be considered 
thoroughly representative of the occupational and social classes 
in the entire community. 

The total number of subjects tested for the Ferguson Form 
Board norms is 582, distributed in grades 2 to 8 and from 5 
to 17 years, inclusive. The method of selection was to take 
every third child in the grades of each school. As this would 
give but slightly more than 80 subjects for each grade, this 
number was selected as the minimum for each grade. The 
names of the pupils were arranged alphabetically previous to 
their selection so as to avoid a predominance or deficiency of 
any foreign group. 

As the grades are divided into two or more sections at these 
schools, the method of selection was applied to each section 
individually so as to avoid any distortion of the norms which 
would occur as a result of homogeneous grouping. 

The results obtained from testing such a group of children 
should, other factors being controlled, yield norms representa- 
tive of average, urban, school children. While about 60 per 
cent of the 620 pupils at Corpus Christi and about 50 per cent 
of the 1373 pupils at East Rochester are of foreign extraction, 
this does not necessarily invalidate the norms for use with an 
American group. In the latter school certainly, and probably 
in the former school, the pupil population reflects accurately 
this element in the community. Those of foreign extraction 
are either the first or second generation born in this country. 
The percentage of immigrant children is so low as to be negli- 
gible. While serving to make the test group representative 
of the population as a whole, it may also be pointed out that 
Pintner maintains (8: pp. 446-447) that there is no essential 














562 GRIFFITH W. WILLIAMS AND JANET LINES 


difference between the foreign and the American group on a 
non-language test. 

None of the Corpus Christi and very few of the East Roch- 
ester subjects had been given any mental test previous to this 
one. None had previously taken a performance test, hence no 
problem arises from the effects of former testing experience. 

From each grade population that had been tested, 50 subjects 
were selected to be given the Kuhlmann-Anderson test. This 
selection was made from those who had previously been tested 
on the form boards. Of these 50 subjects, 25 were from each 
school. As Corpus Christi provides a third of the test group 
for each grade, i.e., from 24 to 26 pupils, it was necessary to 
eliminate only an occasional name from the list for the Kuhl- 
mann-Anderson testing, and this was done by chance. As 
two-thirds of the testing group for each grade comes from the 
East Rochester school, i.e., 54 to 56, it was necessary to elimi- 
nate slightly more than one-half. This was done by arranging 
the names of those who had been given the form boards in alpha- 
betical order and eliminating every second name. Those still 
in excess of the required number were eliminated by chance. 
The only variation of this procedure was in the third and fourth 
grades at East Rochester where a few of the children had taken 
Kuhlmann-Anderson test about 6 months before. These were 
eliminated as possible subjects for this test before the final 
selection was made. 

PROCEDURE 


To insure the occurrence and subsequent elimination of any 
difficulties which other examiners might encounter, it was 
found necessary to test 150 children in a preliminary investi- 
gation. The results of these tests are not included in the norms 
to be provided in this report. As problems arose, the instruc- 
tions were tentatively modified and used in further testing. 
The final standardization of procedure was adopted when 50 
children had been tested without the occurrence of any further 
difficulties. This restandardization of the instructions was 
done in order to increase the reliability of the test as previous 











THE VALUE OF FORM BOARDS 563 


experimenters had left certain decisions to the discretion of the 
examiner. 

It became necessary to formulate a system whereby each 
block would be placed in the same relative position for each 
subject, and which would be simple to administer. A stand- 
ardized arrangement is particularly important for Board 6 
as all the units in this board have ambiguous blocks, i.e., one 
block in each unit that may be inserted into two places but only 
one of which is correct. Both the ordered placement of the 
blocks in the entire series and the significance of the chance 
element in the placement of the ambiguous blocks have been 
ignored in previous studies. 

Situations which occurred during the preliminary testing 
revealed several inadequacies in the instructions provided by 
previous experimenters, notably with reference to stopping 
the test after successive failures on two of the boards (2: p. 
126). In the present standardization the entire series is given 
to the subject irrespective of his performance on any given 
board. This procedure was adopted on the assumption that 
the entire series of 6 boards constitutes the test and also because 
it was found that many subjects would fail on two successive 
boards and then partially or wholly succeed on subsequent 
boards. By discontinuing the test after two failures, the test 
in some instances would be reduced to one-half and in many 
cases to two-thirds its maximum length. While this in itself 
would reduce the reliability of the test, the subject’s capacity 
would, furthermore, not be adequately measured as it has been 
found that subjects adjust their performance to the difficulty 
of the task presented to them. Thus an older subject may do 
no better than a younger one on a test devised for younger 
subjects but will surpass the younger on tests devised for his 
own age. A further justification for giving the entire series 
is that insight was found to occur with reference to the use of 
the bevels. When this insight is delayed in the case of the 
duller subjects, their performance on the more complex boards 
may be relatively superior to that on the simpler boards. The 




















564 GRIFFITH W. WILLIAMS AND JANET LINES 


subject should be credited with this improvement even though 
it be delayed. 

More adequate instructions also became necessary because 
of situations arising from the introduction of bevels for the first 
time with Board 3. Many subjects regarded a performance 
as complete when only one-half of the insert was correctly 
placed, the other half being incorrectly placed because the 
bevels were not fitted together. Since previous instructions 
did not permit the examiner to remove the board until the time 
limit of 5 minutes had elapsed, such incorrect placement of 
one-half of the insert produced an awkward delay during which 
the subject became impatient and ill-at-ease. A revision was 
attempted by requesting the subject to return the board when 
he considered it completed and make a record of this time but 
this also proved to be inadequate. It was found that if the 
subject were allowed to be satisfied with a performance on one 
board which included an incorrect use of the bevels he would 
not become motivated to attempt to discover their correct use 
in subsequent boards. It was found that the best performance 
was obtained by telling a subject who seemed satisfied with an 
ineorrect performance (completed before the end of the time 
limit) that he must try some more as the blocks will fit in 
smoothly if they are correctly inserted. The exact wording 
to be used is given in the directions which follow. 

When, however, a performance is still incomplete or inaccu- 
rate at the end of the time limit the examiner should re- 
arrange or insert the pieces in front of the subject. The state- 
. ment to be made to the subject at this time is also given in the 
- instructions. It was found that this method serves to encour- 
age the child and to dispel any suspicion that there is a trick 
in the test. 

As a result of the changes introduced during the preliminary 
testing, the following directions for administration were formu- 
lated and rigidly followed throughout the testing on which the 
norms are based. 

It was found that the presentation of the test could be facili- 
tated by placing the board directly in front of the subject and 











THE VALUE OF FORM BOARDS 565 


the blocks to his right as shown in Figures 1, 2 and 3. To 
enable the examiner to handle the boards readily, it was found 
expedient for him to sit at right angles to the subject and to his 
right, t.e., on the side nearest the blocks. 


DIRECTIONS 


Give all the boards in the order 1 through 6 to all subjects 
(regardless of failure on one or more of the boards). Present 
one board at a time, placing the blocks at the right of the sub- 
ject, and the board directly in front of him. The removal and 
turning over of the blocks are always done so that the subject 
cannot observe them. The boards which have not yet been 
presented are kept either covered or concealed. 

The blocks are numbered from left to right in sequence, 1.¢., 
No. 1 is in the upper left-hand corner of the board and No. 6 in 
the lower right-hand corner. Place the blocks as indicated 
below (cf. also Figs. 1, 2, and 3) : 


Board 1: Place the blocks in two rows. In the upper 
row place the blocks from left to right in the order 1, 3, 5, 
and similarly in the lower row, blocks 2, 4, and 6, all in the 
lower row being reversed. 


LR ¢ 1 3 5 
2 4 6 
Board 2: Place the blocks in 4 rows as follows: 
LR ¢ 1 3 5 
2 4 6 
5 3 1 
6 4 2 


No turning over is necessary, but the smaller blocks are 
placed in the lower rows. 


Board 3: Place the blocks in the order indicated for Board 2, 
but turn over one of each pair so that the largest side of each 
block is uppermost (the bevels thus being hidden). 


Board 4: Place the blocks in the order indicated for Board 2, 
but turn over one of each pair. 














566 GRIFFITH W. WILLIAMS AND JANET LINES 


Board 5: Place the blocks in the order indicated for Board 2, 
but turn over the smaller of each pair. 


Board 6: Place the blocks in the order indicated for Board 2, 


but turn over one of each pair so that the larger side of each 
block is uppermost. 


It will be noticed (cf. Figs. 2 and 3) that the blocks which 
are turned over are placed in the lower two rows. None of the 
blocks in the first two rows is turned over. It has been found 
in practice that the easiest way to remember the sequence is to 
place the blocks in the sequence 1, 3, 5, and 2, 4, 6 in the upper 
rows from left to right (as in reading) and to use the same 
sequence (1, 3, 5, and 2, 4, 6) in the lower rows, but reversed, 
i.¢., from right to left. In this way the blocks which are not 
turned over, are removed first, and all the blocks to be reversed 
are removed afterwards. 

When preliminary conversation has established a co-operative 
attitude, the following instructions should be given: ‘‘J am 
going to give you this board which has holes init. These blocks 
fit into the holes. If you put them in correctly, they will all fit 
smoothly. Do you understand what you are to do? All right, 
then: I want you to fit them in as quickly as you can. Ready: 
go ahead.’’ After completion of each board the examiner 
says, ‘‘That’s fine,’’ and with the presentation of subsequent 
boards says, ‘‘Now do this one. Ready: go ahead.’’ 

If the subject seems satisfied with an incorrect performance 
on any of the boards before the end of the time limit, the ex- 
aminer says, ‘‘ You must try some more, for if you put them in 
correctly, they will all fit smoothly.’’ If the subject asks ques- 
tions as to the correctness of his performance during the test, 
the examiner says, ‘‘You must do your best and decide for 
yourself.”’ 

If the performance is incomplete or incorrect at the end of 
the time limit, the examiner says, ‘‘ That’s fine, but let me show 
you that all the blocks fit in smoothly,’’ and re-arranges or 
inserts the blocks so that the subject may see him do so. 

Scoring: The time limit for each board is 5 minutes. Record 
the time for each board in seconds and in case of incomplete 
or incorrect performance, the number of units correctly placed. 
When ‘‘scores’’ are used as the basis of judgment, the total 
number of units correctly placed is to be divided by the total 
amount of time taken to place them. 











THE VALUE OF FORM BOARDS 567 


Current practice (1: p. 11) was followed in determining the 
chronological age of the subject, e.g., a child was regarded as 
5 years old from the time he is 5.0 until he is 5.99 years old, 
i.¢., until he is within 3 days of his next birthday. While the 
method of regarding a chronological age of 5 as representing 
ages from 4.5 years to 5.49 offers what appears to be a superior 
identification of chronological age, sufficient justification for 
its use could not be found in the current literature on testing. 

The validity of a performance test as a measure of intelli- 
gence is usually determined by a comparison of its results with 
those of a verbal test. The Kuhlmann-Anderson test (6) was 
selected for this purpose because of its discriminative capacity 
at the levels included in the test population of this experiment. 
The Kuhlmann-Anderson test was given as soon as the required 
number of subjects in any one grade had been given the per- 
formance test at either school and in no case was this more 
than 4 weeks after any subject had been given the performance 
test. 

The conditions under which the testing was done were excel- 
lent. A private room was provided in both schools so that 
there were no interruptions and no child saw the performance 
of another. The subjects seemed to like the test and no diffi- 
eulty occurred in securing their cooperation, a brief prelimi- 
nary conversation being all that was necessary to accomplish 
this. Comments of an incidental nature were also usually made 
during the time required to remove one board and present the 
next. 

DERIVATION OF NORMS 


Norms previously published for the Ferguson Form Boards 
(2: p. 127; 3: pp. 53-57) consist of values which are based 
on the time taken to complete each board. For example, on 
Board 1 a subject receives a value of 10 if the board is com- 
pleted in from 1 to 16 seconds and so on down tke scale to a 
value of 1 if he completes it within from 67 to 300 seconds. 
Comparable, though not identical, values are given for per- 
formance on the other boards and a subject’s score is the sum 








568 GRIFFITH W. WILLIAMS AND JANET LINES 


of these values. Neither study, however, provides an adequate 
justification of this method and the present study includes 
an attempt to determine the most adequate method of scoring 
and presenting the data. 

No attempt has been made to treat each board separately 
and consequently, the discussion which follows refers to per- 
formance on the entire series of 6 boards. 

Six possible methods of scoring were investigated : 

1. The total time multiplied by the total units (‘‘units’’ 
being the term designating the number of spaces cor- 
rectly filled). 

The square root of time multiplied by units. 

The cube root of time multiplied by units. 

The total number of units. 

The total time. 

The total units divided by the . .1 time (this ratio, for 
convenience, being called ‘‘score’’). 


S OF > Go po 


These six scoring methods were judged by two criteria, 
namely, the degree to which they indicate growth as judged 
by the Growth Units of the Kuhlmann-Anderson test and the 
degree to which they discriminate the performance of various 
chronological ages. Table 1 gives the correlations that were 
used in the determination of the best scoring method. 

Multiplying the number of units by the total time proved 
to be unsatisfactory, since the results failed to discriminate a 
good performance from a poor one. A subject placing a small 
number of units in a long time (both indicating poor perform- 
ance) might have the same or even a larger value than one who 
placed a larger number of units in a short time (both indi- 
eating good performance). Furthermore, Table I shows that 
when correlated with K-A Growth Units these values give 
such variable correlations as + .236 + .06 for the subjects in 
grades 2 and 3 and —.229 + .06 for the subjects in grades 
4 and 5. 

An attempt to reduce the relatively large weight given to 
time in this method by taking either the square root or cube 





: 
: 
z 
: 
: 
3 
E 


"(11g “d 2) S312" Pecoxe plmoys xoput 


04} 38q} permmber Ayrv0UT, JO WOTI07II0 ey} pue (ZTg pue gos “dd :z) 8, [pO sem UL soz posn BmUIOZ Oy , 





gol 


68°S 


Le’9 


¥8'P 


09°F 


10° + 106" - 
TO" + $E2" + 
30° + LTO" + 
10" + Sez" - 
30° + LI9" - 


30° + T09" + 


6s 


os’s 


SL’t 


59'S 


10° + TZ8°- 
30° + 889° + 
30° + 36g" + 
10° + 302° - 
30° + 089" - 


80° + FE9° + 


10° + Shs" - 
60° + 399" + 
30° + 88g" + 
30° + 889° - 
30° + 069" - 
30° + 89S" + 
90° + 163° + 


L0° = 69T' + 


90° ¥ 623° - 


90° + 983° + 


syrun—£ 
‘ard wo ou pue sup 
s}zrun yyMois—A f e1008—x 
*s}Tun yjMoIs Y-y pues “g’gq'd WO e100g 


{ oulTy—x 


‘VO pue ‘gaa Yo s100g 
sjrun qyjymos3—f { ourty—x 
‘syyun yyAols Y—y pus “qT wo oul 
an -yvio-4 
‘vO pue ‘gq wo our, 
*yo-4 
{symun—-x ‘yp pus ‘gaa uo syrup 
~~ ¢ 9 % Soperp ‘s}run yZMOIs Y—¥ 
pus g'q'q{ WO omy Jo yoor oqno F 8y{UL) 
¢ % 
% sopeip ‘sjrun qjMols Y-y pus 
a Xd wo omy zo oor orenbs 2 syyuQ 
¢ 9 } seperp ‘s}7un 
qy4oi3 Y-y pue _q'a'd WO omy} & sy 
€ % Z sepeip ‘szrun 
qjaoi3 Y—y pus ‘q'd' wo out} x syrup 


{ e1008—x 





{ oult}—x 











xurava 
axl -NITIAWOD 9xxl a " 
«40 XHQNI 


-NITIAGOO daLvITaaio) Viva 


40 XGQNI 





spivog wsog uosnbiag 04; Gussoog fo poyjaw oy, ounmsojzeg 02 pown1igg su01{DI04109 247 
I WIdvi 











570 GRIFFITH W. WILLIAMS AND JANET LINES 


root of time instead of the actual time proved also to be unsatis- 
factory. Neither discriminated satisfactorily between a good 
and poor performance and the correlation of these two time 
values with the K—A Growth Units, as shown in Table 1, indi- 
eate that both are entirely unsatisfactory as indices of per- 
formance on the form boards. 

The possibility of using the number of units satisfactorily 
completed as an index is shown by the ratio of correlation 
between units and chronological age where yxy is + .634 + .02. 
The number of units completed by the experimental group 
ranged, however, only from 10 to 36. Such a range would make 
this an unsatisfactory index as the distribution within a grade 
or age would be too limited to indicate variations in per- 
formance. 

The amount of time taken to complete the series gave a cor- 
relation of —.630 + .02 (yxy) with chronological age. This 
shows that an increase in age varies concomitantly with per- 
formance (low time being an index of good performance). 
Furthermore the correlation of time with K—A Growth Units 
is —.735 + .01 (nyx). Such a correlation shows that total 
time is an index of growth and can consequently be used as an 
index of performance on the Ferguson Form Boards. An ex- 
amination of Figure 9 also shows a satisfactory distribution, 
4.¢., of about 1600 seconds in time values on the series even 
though these values have not been converted into equivalent 
seale values as has previously been done in studies of this 
performance test. 

The use of time alone, however, recognizes only the speed of 
performance and disregards its quality. If time alone is taken, 
a subject who places 5 units correctly on a board within the 
maximum time is given the same credit as one who places none 
of them correctly, i.e., both are regarded as failures and are 
given the maximum time. Such being the case, an attempt was 
made to recognize both speed and quality of performance and 
in this way derive an index which would distinguish incomplete 
performance or partial success from complete failure. The 





THE VALUE OF FORM BOARDS 571 


high inverse relationship between units and time, indicated by 
a ratio of correlation of —.901 + .01 (nyx) justified a combina- 
tion of these two. Consequently a ratio obtained by dividing 
units by time was derived and called ‘‘scores.’’ The fact that 
these scores correlated with chronological age to the extent 
of + .617 + .02 and with the K—-A Growth Units to the extent 
of + .733 + .01 shows that scores constitute as good an index 
as time when judged by these standards. They, moreover, 
give a range of about 1200 points for ages 5 to 17 and show a 
progressive increase with both age and grade, thus indicating 
a high discriminative capacity. 

While neither score nor time shows a distinct superiority 
when judged statistically, the capacity of the scores to dis- 
criminate various degrees of failure seems to the authors to 
make this the preferable index of performance. 
































REPORT OF A STUDY DONE IN A 
WATCH FACTORY 


BEATRICE CANDEE anp MILTON BLUM 
Testing Division, Vocational Service for Juniors and 
New York State Employment Service 


INGER and tweezer dexterity tests' were used by the Em- 
f ployment Service in selecting 118 girls for jobs in a watch 
factory. The test scores used for elimination were of 
necessity merely estimates, since no testing had been done in the 
plant. They were based upon observations of the opera- 
tions in the plant, upon experience with the use of the finger 
dexterity in department store wrapping and packing, and upon 
wrapping and stripping operations in a chewing gum factory. 
In order to test these estimates the Watch Company coop- 
erated by making available for testing a group of workers in the 
factory and by having these workers rated by foremen. 

The employment manager was asked to secure for us his 
twenty best workers and his twenty poorest ones. These will be 
designated throughout as Group A and B, respectively. Tests 
were actually given to 20 people in Group A and 17 in Group 
B. It must be kept in mind that those in Group B, although 
less desirable than other workers in the plant, were still satis- 
factory and had been retained well past any try-out period. 
In addition to Groups A and B, 30 girls whom the State Em- 
ployment Service had placed at the plant were rated by the 
foremen and were available on the day of testing for retests. 
Of these thirty, one had been selected by the foreman for 
Group A and four for Group B. The retest group, therefore, 
shows this degree of overlapping with the other two groups.’ 


1 Tests used were the Johnson O’Oonnor finger and tweezer dexterity 
tests. 

2 When these five occur in Group A and B, original test scores were used 
to make them comparable with the others in the group. 


572 





A STUDY DONE IN A WATCH FACTORY 573 


Before any tests were given, the Employment Service re- 
ceived two ratings on each worker, one based entirely upon 
speed and one covering the individual’s value on the whole to 
the plant. These ratings were given by the foreman and were 
based upon a five-popint seale, A, B, C, D, E, with C considered 
as average. As would be expected, since both groups were 
made up of adequate workers, no E ratings and very few D’s 
were given. 

Before the study was begun it was understood by the plant, 
and made clear to the workers, that no report of individual test 
results would go back to the plant. This agreement seemed the 
best way of safeguarding the workers and of assuring good co- 
operation and a lack of nervousness on their part. The environ- 
mental conditions for testing were very favorable. 

For comparison with the workers in the watch plant we had 
available test scores on a number of other groups. Of these, 
the following four seemed to offer the most significant com- 
parisons: First, a group of 420 women was tested at the Queens 
branch of the Employment Service, the branch which supplies 
workers to the watch plant. These were nearly all young 
women and a very large proportion of them had been tested 
because they had some interest in work at the watch company 
in question. They are probably representative of the potential 
labor supply for this plant as selected by interviewers without 
the aid of tests. Second, a group of 100 young women tested in 
the industrial division of the Manhattan branch of the Employ- 
ment Service. This group is probably representative of the 
type of women in New York City who apply for jobs which 
require manual dexterity and who have actually had previous 
experience in work of this sort, largely as wrappers, packers, 
or assemblers. Third, four groups of girls from different Jun- 
ior offices in the Employment Service. These girls were 16 to 
21 years of age. A much greater proportion of these were with- 
out any work experience than is the case in the adult groups. 
Although most girls in the watch plant are over 21, this group 
was included on some of the comparisons to see whether youth 











574 BEATRICE CANDEE AND MILTON BLUM 


and inexperience would have any noticeable effect on the test 
scores. Fourth, a group of 18 experienced and superior work- 
ers tested in March, 1935, in a chewing gum factory. These 
were girls employed in packing or stripping small packages of 
gum. The work is not nearly as fine as assembling at the 
watch plant, but does nevertheless, represent a performance 
of finger dexterity. 














The Results of the Study. 
A. Time Scores. 
TABLE I 
Comparison of Total Times on the Finger Dexterity Test. 
wane poceens AVER. SIGMA | SIGMA 
euPLOTEES a ree 
Group A (superior work- 
ers) 6/55” | 5/56 to 8/6” 34 7.6 
Group B (mediocre work- 
ers) 7’32” 5/57 to 9/49” 57 15.2 





Group C (original tests of 
30 additional people 





placed by NYSES) ....... 651” 5/50 to 8/35” 32 5.8 
Group D (retests of Group 
C)s 6/23” 5/12 to 8/03” 43 8.0 
Control Groups 
Adults from Queens ...| 7/43” 5/15 to 12/15” 69 
Adults from Manhattan _ 7/33” | 5/37 to 14/15” 73 
Juniors: Harlem  ....cc.0.- 7/40” 5/47 to 11/30” 56 
TE IE 7/40” 5/37 to 14/14” RE HEN. 
Brooklyn. .............. 7/36 5/22 to 1052” eh nie 
Manhattaa ............. 7/38” 5/37 to 14/19” 64.5 


Experienced Workers at 
the Chewing Gum Fac- 
tory 6/53” | 557 to 8/08” 36 




















By comparing the various control groups with each other it is 
found that no reliable difference exists among them. The aver- 
age scores of junior and young adult applicants, even when 
the latter are supposedly experienced on dexterity jobs, are 
found to be similar. The differences in total time between 


8 All people in the group had Finger Dexterity test previously. 











A STUDY DONE IN A WATCH FACTORY 575 


Group A and the control (adults from Queens) is statistically 
reliable (D/c Diff. equals 3.3).* The difference between Group 
B and the control is statistically unreliable (D/c Diff. equals 
07). 

Superior workers in the watch plant have a better average 
test performance than mediocre workers. While this difference 
is not quite statistically reliable (D/c Diff. equals 2.18) the 
chauces that a true difference exists are 98 out of 100. 

None of the groups of employed workers has scores as low 
as some of those found in the control groups. The entire range 
of scores among workers in the watch plant is 4 minutes, 
whereas that of the Queens control group is 9 minutes. How- 
ever, the range of scores, even among the superior workers in 
the watch plant is quite wide. Although none of them goes down 
into the lowest ranges of the control groups, 1 few are found 
down to or even a little below the average of these control 
groups. In the group of A and B workers in the plant making 
relatively slow test scores, the two slowest individuals are rated 
as unsatisfactory in speed by the foremen. However, two of 
the five people with double A ratings on work in the plant 
are among those with test scores down to the average of the 
applicants. | 

The rather wide range of test scores found among superior 
workers in the plant would indicate that in this case, as in 
practically all others, there are factors besides those measured 
on the test which are important for success on the job. In the 
selection of any single individual these other things such as 
temperamental fitness for the work as well as broader attitudes 
toward plant and home, social and economic stress, ete., must 
be taken into consideration. 

The correlation between the foremen’s ratings and the total 
time on the finger dexterity test for groups A and B is plus 


49 Diff.=,/o + 
av. + AV. 


5 Ample evidence of this in such studies as Slocombe on Boston Elevated 
(Brit. J. of Psych. Vol. 21), and the work in the Hawthorne plant of the 
Western Electric Company (Elton Mayo, Human Problems of an Industrial 
Civilization, Maemillan, 1933). 





ne ee OER SI OE TNE ONE RR OMG 


576 BEATRICE CANDEE AND MILTON BLUM 


.26 with a sigma of +.115.° This correlation is therefore 
low and unreliable when ratings and total time on finger dex- 
terity test are compared. The crudeness of these latter ratings 
and particularly the limited range of scores both tend to 
reduce the correlation coefficient. However, one would never 
expect to predict foremen’s ratings merely on the total time 
of a dexterity test, even if both measures were true. Too many 
extraneous factors must always operate. The fact that even 
a low positive correlation exists in spite of these factors sup- 
ports the other evidence that total time on the test is of some 
importance for success on the job. In view of this it seemed 
worth while to make a more detailed analysis and study of the 
time scores. 

Due to the very small number of cases in these groups, all 
points discussed in regard to them must be considered merely 
as interesting suggestions for further work of this sort. How- 
ever, they seem to be worth noting. 

On the finger dexterity test, the mediocre group is more vari- 
able than the good group. The scores of the superior workers 
tend to cluster at 7 minutes or better, whereas those of the medi- 
ocre group are spread more evenly over the whole range down 
to 9 minutes. 

However, individual members of the superior group go down 
to 8 minutes. 

As a group, the workers called superior make faster test scores 
than those called mediocre, even though within the groups the 
test scores have little relation to the ratings of the foremen 
(correlation plus .26—see page 574). 

The greatest difference between the superior and mediocre 
workers is not shown by the fastest test scores but by those in 
the range of 64 to 7 minutes, which is close to the average of 
the group of plant workers. The people faster than 64 minutes 
are highly variable. Among them occur 25 per cent of the 
superior (A) workers but also 25 per cent of the mediocre (B) 
ones, and among this 25 per cent of mediocre workers are two 
out of the total five workers having D ratings by foremen in 

6 Product—moment correlation. 

















A STUDY DONE IN A WATCH FACTORY 577 


speed in the plant. The people with D ratings may tend to 
have test scores in both extremes of the range of plant workers 
for two occur near the top and two at the bottom of the range 
in the plant. The fastest test performances in the plant show 
very little differentiation between the superior and mediocre 
workers, but in the interval between 64 and 7 minutes, the dif- 
ferentiation is marked. Test times of more than 7'10” have 
greater probability of selecting mediocre rather than superior 
workers. On the other hand, test scores faster than 7’10” have 
greater probability of selecting superior workers. In this 
study the most favorable probability is greatest in the 6’30” 
to 7’00” range. It must be remembered that the small number 
of cases does not allow the drawing of definite conclusions. 

The indications, therefore, are that while speed on the Finger 
Dexterity test does seem to have some relation in selecting good 
from mediocre workers in the plant, that relationship is not as 
clearly marked in the highest ranges of the test scores as in 
those slightly slower. Speed as measured on the test is not 
alone sufficient to lift a worker out of the mediocre class, since 
25 per cent of the mediocre workers have very fast test scores. 
Likewise, relative slowness on the test will not prevent an indi- 
vidual from becoming a superior worker since; two of these 
workers make relatively slow test scores of 8 minutes. 

Only one control population was available for the Tweezer 
Test, as it has been less widely used by the Employment Ser- 
vice. A real difference exists between Group A and the Con- 
trol (D/c Diff. 7.1) and Group B and the Control (D/o Diff. 
4.5). However, no statistically reliable difference exists be- 
tween Groups A and B (1.01). The changes are only 84 out 
of a 100 that a true difference exists. 

The groups in the plant include none of the slowest scores in 
the control groups, and in addition, their upper range is slightly 
higher. . 

The correlation between the foremen’s ratings and the total 
time on the T weezer Dexterity is plus .027, which indicates that 
no relationship at all exists. Despite the fact that no relation 








578 BEATRICE CANDEE AND MILTON BLUM 


TABLE II. 
Comparison of Total Times on the Tweezer Dexterity Test 














AVER. 
WATCH FACTORY EMPLOYEES : "oar RANGE SIGMA 
TEST 
Group A (Superior Workers) .......... 4/50 3/42 to 5/54 | 37.4” 
Group B (Mediocre Workers) ......... 5/04 3/46 to 6/49 43.4” 
Group C (Original teste of 30 addi- 
tional workers placed by State)... 501 | 405 to 652 | 38.5” 
Group D (Retests of Group C) ...... 4/30 3/44 to 6/14 36.2” 
Control Group: 
(Adults from Queens) 2.0. ccm 5/55 4/00 to 9/45 68” 








is present between ratings and total time on the test, it seems 
advisable to investigate results on Tweezer Dexterity just as 
was done with Finger Dexterity. 

Although the scores of the A group do not go down quite as 
far as those of the B group, they show no tendency to cluster in 
the faster ranges. Both groups are widely distributed over the 
whole range of test scores found in the plant. This is shown 
in the averages. Both A and B groups have fast scores com- 
pared to the control. All in Group A are faster than the con- 
trol average. Only two are slower in Group B than the control 
average. 

The failure of the fastest test times to distinguish the supe- 
rior from the mediocre workers is even more marked on the 
Tweezer than on the Finger test. 

The most favorable area discriminating A and B workers is 
508”. However, this score does not even approach statistical 
reliability. D/o diff. of the percentage is only 1.24, which 
means that the chances are only 88 in a 100 of differentiating 
A and B workers at most favorable area. 


B. Quality Ratings. 

Each person that takes the Finger and Tweezer test is rated 
by the examiner on her performance. This observation of per- 
formance is highly desirable as it yields additional information 
about the method used and the tension of the client in taking 








A STUDY DONE IN A WATCH FACTORY 579 


the test. These observations culminate in what is called the 
quality rating with a ‘‘3’’ considered as average and a possible 
range of from one to five. 











TABLE III 
Distribution of Quality Ratings—Finger Dexterity Test 
WATCH WATCH GROUPS QUEENS 
FACTORY FACTORY A&B CONTROL 
QUALITY GROUP A GROUP B COMBINED GROUP 
% % % 
1. (Excellent) ........ 20 6.6 14.2 3.6 
2. (Above aver.) ... 50 40 45.7 39.2 
3. (Average) ........... 30 53.3 40 | 50.9 
4. (Below aver.) .. 0 0 0 5.8 
IOI ceases 0 0 0 3 














Ratings of below average are not found at all among the 
plant workers but they were also rarely used with the control 
group.” The quality ratings as well as the time scores seem to 
be valuable in differentiating between the applicants and the 
successful workers in the watch plant. Like the time scores 
they show some difference between the superior and mediocre 
workers in the plant on the Finger Dexterity. 

The correlation between total time and quality on the Finger 
Dexterity test is + .26. This correlation is low enough to indi- 
cate that quality on this test is largely independent of speed 
and, therefore, the favorable time scores made by the plant 
workers on this test do not in themselves account for the good 
quality ratings. It appears, therefore, that the consideration 
of quality ratings on the Finger Dexterity test is a relatively 
independent factor from total time on the test and should be 
considered as a separate entity in the attempt to predict suc- 
cessful performance on the job. 

Relation between time and quality on Tweezer Dexterity in 
control group is + .42. In view of this relation, quality on this 
test cannot be judged to discriminate to the same degree as on 

7 We have known that our ratings came rather high but have made no 


great effort to change this since the group referred to us is selected for 
probable placement which means that much of the lower range is omitted. 











580 BEATRICE CANDEE AND MILTON BLUM 
TABLE IV 
Distribution of Quality Ratings—Tweezer Dexterity 

’ WaTCH WATCH GROUPS QUEENS 

FACTORY FACTORY A&B CONTROL 

QUALITY GROUP A GROUPB | COMBINED GROUP 

% % % 

1. (Excellent)... 35 26.6 31 5.9 
2. (Above average) 45 60 51 50.1 
3. (Average) ........ 20 13.3 17 36.3 
4. (Below average) 0 0 0 7.1 
i ee Pe 0 0 0 5 




















the Finger Dexterity. However, people with quality ratings 
below average apparently should not be considered for work. 


C. Improvement Shown on the Test. 

Except with very fast performers, the time taken to complete 
the second half of the test is usually less than that of the first 
half. However, our data in this study and in another involving 
a thousand cases, do not support O’Connor’s method of han- 
dling this improvement by adding a constant fraction, i.e., one- 
tenth of the second half. We have always found a constantly 
accelerating increase in the amount of improvement as the time 
on the first half lengthens. O’Connor’s practice of comparing 
the amount of improvement to the second half automatically 
tends to conceal the acceleration to some degree, but there seems 
to be no precedent whatever and no theoretical justification for 
computing improvement in relation to the repetition instead 
of to the initial material, since the time required on the repe- 
tition is a function of the improvement. 

It must be taken into account that capacity for improvement 
becomes negligible as one reaches the physiological limit, and, 
therefore, faster starting times are conducive to less improve- 
ment than slower starting times. This is established in Table V. 

The tests at the Watch Company seem to emphasize the im- 
portance of this factor of improvability for selecting workers 
in view of the improvement rates obtained by the superior plant 
workers, which are much greater than would be expected in a 
normal population. Group A improves by 23 seconds. This 











A STUDY DONE IN A WATCH FACTORY 581 











TABLE V 
Average Improvement for Each Time Interval Queens Control Group 
TIME ON FIRST HALF “SS uaaneae NO. OF CASES 
a 3.5 40 
ee 2.2 72 
5S 4) 3.1 65 
gf) 9.1 71 
ee 6 | en 18.3 34 
Ce 6) 22. 15 
gly | 32. 7 








is more than ten times the 2.2 seconds average improvement 
shown by the control for this test time. Group B improves 
by only 4 seconds, which is about the improvement expected 
from the control. 

Improvement on the test seems to be a particularly signifi- 
cant factor in the case of particular individuals whose ratings 
in the plant are higher than would be expected from their total 
speed on the test, for improvement is noticeably greater in 
people with good ratings and relatively slow tests than with 
people with the same test time and low ratings. 


TENTATIVE CONCLUSIONS 


1. Neither age nor experience in the watch plant seem to 
affect the scores on the finger dexterity test. It would seem, 
therefore, that as far as the skill measured by this test is con- 
cerned, the plant may as well draw upon the relatively young 
or inexperienced group for its labor supply. However, other 
factors besides skill, which might be affected by age and experi- 
ence, need to be evaluated in adopting such a policy. 

2. Speed on the dexterity tests does tend to select successful 
workers in the plant. It seems possible to determine a critical 
score below which it is not advisable to hire. This is in line 
with the results of most studies on the use of tests in industry. 
However, the very fastest scores on the tests do not discrimi- 
nate between superior and mediocre workers nearly as well as 
do the slightly slower scores. 








582 BEATRICE CANDEE AND MILTON BLUM 


3. Tweezer dexterity seems to be an essential skill for all who 
are to remain in the plant. It differentiates the plant workers 
from the general population better than does the finger dex- 
terity, but it does not differentiate between mediocre and 
superior workers as well as finger dexterity. 

4. Finger dexterity in addition to tweezer dexterity seems to 
be highly desirable for superior work in the plant. 

5. Amount of improvement shown on the finger dexterity 
seems to be important in choosing superior workers, particularly 
among those with test scores which are relatively slow in rela- 
tion to the plant group. 

6. Quality on finger dexterity is another matter to be consid- 
ered in hiring for the Watch Factory. 


RECOMMENDATIONS 


*1. An initial selection of workers should be made on the 
basis of the tweezer dexterity test. It seems very desirable that 
any girl hired should be able to make a score on this test of at 
least 5 minutes and 30 seconds and a quality rating of better 
than average. These standards: ~ply to the first time she takes 
the test as there is a practice effect after that. 

*2. In looking for superior workers, it seems valuable to use 
also the finger dexterity test. A time of 7 minutes and a half or 
faster with a quality rating of average or better is recom- 
mended. In addition, improvement shown on this test should 
be considered, particularly for people in the slower range of 
still acceptable scores. Further study on a larger number of 
eases is necessary for exact recommendations on amount of 
improvement. 


* 5/30” and 7/30” are chosen to allow for the human factor of motivation, 
also the availability of the labor supply. The data of Group A and B in- 
dicate that 5/08” on the Tweezer Dexterity and 7/10” Finger Dexterity will 
be the most valid critical scores in differentiating A and B workers. How- 
ever, in view of the limited number of people in Groups A and B, it is sug- 
gested that 530” and 7/30” be used as critical scores with a further investi- 
gation to be repeated at a later date, 

















PREDICTION OF THE SCHOLARSHIP OF 
FRESHMAN MEN BY TESTS OF 
LISTENING AND LEARNING 
ABILITY 


ROBERT B. SELOVER 
University of Minnesota 
AND 
JAMES P. PORTER 
Ohio Uniwersity 


HIS study was undertaken at Ohio University during the 

+ year 1935-36 in an effort to determine how much the 

prediction of scholastic success of freshman men could 

be improved by certain objective tests. Special consideration 

was given to tests of listening and learning ability constructed 
at Ohio University. 

Two groups of subjects were used. The first group was com- 
posed of seventy-two freshman men who had been given the 
Ohio State University Psychological Test, Form 18. The sec- 
ond group was composed of seventy-seven freshman men who 
were given the Ohio University Learning Ability Test. Both 
of these examinations are work-limit rather than time-limit 
tests. 

In both groups point hour ratio* is used as the criterion 
variable and is referred to throughout as variable 1. Variable 
2 in Group I is the Ohio State University Psychological Test, 
Form 18, the number of the form denoting the number of revi- 
sions of the test. Variable 2 in Group II is the Ohio University 


* Point hour ratio is a measure of scholastic success calculated by divid- 
ing the total number of hours carried into the total number of points 
earned. One hour of A is equal to three points. One hour of B is equal 
to two points. One hour of C is equal to one point. No points are given 


for grades of D and F. Thus a C average gives a point hour ratio of 
1.000. 


583 








584 ROBERT B. SELOVER AND JAMES P. PORTER 


Learning Ability Test, which is now in its second revision. In 
the construction of this test an attempt was made to place less 
emphasis on linguistic ability than is commonly found in tests 
of college ability. The items include perspective views of cubes 
to be counted by the subject ; certain sign-, number-, and letter- 
combinations to be matched; the learning and use of a code; 
and particular areas to be located from superimposed geo- 
metric figures. 

Variable 3 in both groups is the Ohio University Listening 
Ability Test, the purpose of which is to measure how well a 
subject can recall material which has been read to him. The 
test is composed of one hundred multiple choice items which 
are answered by indicating yes, no, or did not say after the first 
six pages of a recent book have been read. 

Variable 4 in both groups is the total score on the Nelson- 
Denny Reading Test for colleges and senior high schools, Form 
A. There are two parts to this test. The first is a test of 
vocabulary ; the second, a test of paragraph reading. 

Variable 5 in both groups is the Barrett-Ryan English Test 
for Grades IX—XII and college Form X. This test has three 
parts: sentence structure and diction, grammatical forms, and 
punctuation. The subject is given forty minutes to complete 
one hundred fifty objective items. 

All tests were administered in the fall of 1935. Raw scores 
are used in every case. 

The results obtained from Group I are summarized in Tables 
Ia and Ib. In general, these results need little interpretation. 
Variable 2 (the Ohio State University Psychological Test) is 
the highest single predictive item for this group (r=.635) 
The intercorrelation between the Ohio State University Psy- 
chological Test and the Nelson-Denny Reading Test (variables 
2 and 4) is extremely high. (r=.777) This is best explained 
by the similarity of material covered in the two tests. From 
the partial correlations reported in Table Ib we notice that 
with intelligence, as measured by the Ohio State University 
Psychological Test, held constant, a significant relationship 











LISTENING AND LEARNING ABILITY 585 


TABLE Ia 


Zero Order Intercorrelations, Coefficients of Reliability, and 
Correlations Corrected for Attenuation 








VARIABLE 1 2 3 4 5 
Oe Pre ae 635 533 614 485 
i dickiibidindiess 935 .657 777 .663 
SO aie N=72 -758 804 .656 452 
SS ae ae 841 .765 .914 .560 
Oe deh Asiastla 951 62.9 53.8 74.8 85.7 
RR ee Ses .633 22.0 10.0 22.6 15.4 




















Note: The figures on the diagonal are coefficients of reliability; those 
to the right are uncorrected zero order inter-correlations; those 
to the left are zero order correlations corrected for attenuation. 








TABLE Ib 
Partial and Multiple Correlations and Partial Beta Coefficients 
PARTIAL AND MULTIPLE BETA STANDARD Bb 
CORRELATIONS COEFFICIENTS ERRORS Sp 
r34.2 =.307 B 12.345 = .2948 Og -1658 t=1.778 
r35.2 =.029 6 13.245 = .1387 Og 1258 t=1.103 
r45.2 =.095 6 14.235 = .2432 6g 1513 t= 1.607 
r 1.2345 =.674 B 15.234 = .0906 og 1211 t= .748 














Variable 1. Total point hour ratio 

Variable 2. Raw score Ohio State Univ. Psychological Test 
Variable 3. Raw score Ohio Univ. Listening Ability Test 
Variable 4. Raw score Nelson Denny Reading Test 
Variable 5. Raw score Barrett Regan English Test 


remains between the Listening Ability and Reading tests 
(r 34.2=.307). Any of the other combinations of tests are 
reduced to zero when the Ohio State Psychological Test is 
‘*partialed out.’’ This fact may be of little importance in the 
prediction of scholastic success but is interesting in the analysis 
of a test purporting to measure listening ability. It may mean 
that the two tests, Listening Ability and Reading, are measur- 
ing essentially the same thing by different methods. It is also 
noticed that none of the t values of the partial beta coeffitients 
reach a significant level (a probability of .05). This would 
indicate that evidence from this sample is insufficient to indi- 
eate the presence of a unique element in any one of the four 
tests as combined in this battery. 








ae Ee Ne 





586 ROBERT B. SELOVER AND JAMES P. PORTER 

The multiple correlation for this group, R 1.2345, is .674, 
which gives a prediction of 26 per cent better than chance. 
The highest single predictive item of this group, the Ohio State 
University Psychological Test, gives a prediction 23 per cent 
better than guessing the mean. 

The results obtained from Group II are summarized in Table 
IIa and IIb. The intercorrelation between variables 2 and 3 


TABLE Ila 


Zero Order Intercorrelations, Coefficients of Reliability, and 
Correlation Corrected for Attenuation 




















VARIABLE 1 2 3 4 5 
S cae N=77 581 531 .603 547 
© sithesnib 916 426 .680 563 
ya coe 491 822 551 391 
YEP LES .743 614 914 674 
| fers és 1,122 105.6 54.2 74.8 89.2 
1 sieiciiasideliti f .679 15.1 11.0 28.2 20.5 











Note: The figures on the diagonal are coefficients of reliability; those 
to the right are zero order inter-correlations; those to the left 
are inter-correlations corrected for attenuation. 




















TABLE IIb 
Partial and Multiple Correlations and Beta Coefficients 
PARTIAL AND MULTIPLE PARTIAL BETA STANDARD Bb 
CORRELATIONS COEFFICIENTS ERRORS Sg 
134.2 =.394 6 12.345 = .2543 Gg -117 t=2.174 
r35.2 =.202 6 13.245 = .2611 6g -101 t = 2.585 
r45.2 =.480 6 14.235 = .1518 Gg -140 t= 1.084 
R 1.2345 = .698 6 15.234 = .1994 Gg -117 t=1.704 
Variable 1. Total point hour ratio 
Variable 2. Raw score Ohio Univ. Learning Ability Test 
Variable 3. Raw score Ohio Univ. Ability Test 
Variable 4. Raw score Nelson Denny Test 
Variable 5. Raw score Barrett-Ryan English Test 1 


(the Ohio University Learning Ability Test and the Ohio Uni- 
versity Listening Ability Test) is low (r 23=.426). This cor- 
relation, when corrected for attenuation, still remains less than 
either of the correlations between these two variables and the 
criterion. If we accept the Reading and English tests as mea- 








| 


LISTENING AND LEARNING ABILITY 587 


sures of linguistic ability, the partial correlation between 
Reading and English with Learning Ability held constant gives 
some evidence of less emphasis on linguistic ability in the Ohio 
University Learning Ability Test. This correlation (r 45.2) 
equals 480 which is considerably higher than the correlation 
between these same tests, Reading and English, when the Ohio 
State University Psychological Test is partialed out (r 45.2 in 
Group I=.095). 

The t values for B 12.345 and B 13.245 exceed the lower level 
of significance, thus indicating that the contribution of each 
of these two tests (the Ohio University Learning Ability Test 
and the Ohio University Listening Ability Test) is sufficiently 
unique to make them both valuable in this battery. 

The multiple correlation for this group (R 1.2345) is .698 
which gives a prediction of 28 per cent better than chance. 
Reading, the highest single predictive item in this group has a 
predictive value 20 per cent better than guessing the mean. 

A further, and perhaps more practical method of interpret- 
ing how well predictions can be made, is in terms of a ‘‘ critical 
seore.’’ Success can be defined as any point above a given 
value in the range of possible achievement and a group pre- 
dicted to fail whose predicted scores fall below a given threshold 
or critical score. Such a method has been employed for group 
II. Success was defined as a C average and point hour ratios 
were predicted for each subject. When the critical score is 
defined as a 1.000 point hour ratio, 77 per cent of the group 
predicted to fail fall below the criterion of success. Sixty- 
eight per cent of the group predicted to succeed have point 
hour ratios equal to or better than a C average. 

If the definition of success is unchanged but the critical score 
lowered to a point hour ratio of .75, 84 per cent of the group 
predicted to fail fall below a C average. Only 16 per cent of 
this group achieve better than a C average. If the critical 
score is lowered to .50 the total group achieve less than a C 
average. The number of cases included in this last group is 
extremely small. Only seven cases were predicted to receive a 











588 ROBERT B. SELOVER AND JAMES P. PORTER 


point hour ratio of .50 or below. Twenty cases were predicted 
to receive a point hour ratio of .75 or less. Thirty cases were 
predicted to receive a point hour ratio of 1.00 or less. 

The practical value of such a method becomes obvious. If 
we wish to select a group to be referred to a counselor for 
advice in laying out an educational program a less rigid method 
of selection is demanded. A group, 77 per cent of whom will 
achieve less than a C average might well profit from such 
advice. Such information used wisely would prove no handi- 
cap to those 23 per cent wrongly predicted to fail. If we wish 
to segregate a group for special instructions, a higher accuracy 
in prediction is demanded. 


REFERENCES 

1. EzexreL, Morpgecal. Methods of Correlation Analysis, John Wiley & 
Sons, Inc., 1930, pp. 183-184. ; 

2. FisHer, R.A. Statistical Methods for Research Workers, 6th Edition, 
Oliver and Boyd, 1936. Tabie IV—Table of t. 

8. GuitrorD, J. P. Psychometric Methods, McGraw-Hill Co., Inc., 1936, 
pp. 392-393. 

4. Wauuace, H. A., anpD SNEDECOR, Gzorce W. Correlation and Machine 
Calculation, Iowa State College Official Publication. Revised 
edition (1913), pp. 68-69. 














A STUDY OF PUBLIC RELATIONS AND 
SOCIAL ATTITUDES 


THE PSYCHOLOGICAL CORPORATION 
New York City 


HIS report deals with the reactions of adults toward 
T certain phases of the following subjects: 

Five Large Corporations 

Large Companies as against Small Companies 

Sit-Down Strikes 

The Supreme Court 

Hugo Black 

Religion 

Government in Business 

Communism and Fascism 

Outstanding Men. 


For the past six years the Psychological Corporation has been 
making scientific studies of sampling or obtaining accurate 
cross-sections of the public through personal interviews. These 
studies have been made in three ways: private studies for com- 
mercial clients, the bi-monthly Psychological Brand Barometer 
Studies, and experimental studies. In the Brand Barometer 
series, a total of 37 nation-wide studies has now been made, 
each in 47 cities and towns throughout the country, and total- 
ing over 150,000 personal interviews. As many or more per- 
sonal interviews have been made in private commercial studies. 

The present study is one of a series of experimental studies, 
such as the Corporation has been conducting at frequent inter- 
vals. The purpose of these experimental studies is to develop 
more adequate psychological techniques for the discovering and 
measurement of advertising effectiveness, influences affecting 

589 





590 THE PSYCHOLOGICAL CORPORATION 


buying behavior, and social trends in the broad field of public 
relations. The study reported here is one of those devoted 
primarily to the study of public relations. This study repeats 
substantially a similar study made between February 18 and 
March 2, 1937. It was made between October 9 to 19, 1937, 
with the cooperation of 70 psychologists associated with the 
Psychological Corporation. Each of these psychologists su- 
pervised a certain number of interviews in his city or town, 
the interviewers being graduate and under-graduate students 
of psychology. A total of 536 interviewers completed a total 
of 5000 interviews, averaging 9 interviews each. All inter- 
views were made in the home and either with the husband or 
wife, about half being made with each. Tabulations by sex 
show that there are no important differences. 

These interviews were distributed geographically so as to 
give a proportionate representation in the southern states, the 
eastern states, the middle west and the far west. They were 
also distributed so as to give an accurate cross-section by 
economic status, as follows : 


10% of the interviews were made in the A economic 
group, family incomes of $4000 a year or over 

30% in the B economic group, family incomes of 
$2000 to $4000 a year 

40% in the C economic group, family incomes of 
$1000 to $2000 a year 

20% in the D economic group, family incomes below 
$1000. 


This distribution is substantially in accord with the reports of 
the Brookings Institute for the urban population as of 1929. 
Since that time there have, of course, been changes in income 
levels but since our interviews are distributed in each city by 
economic sections or territories, the highest ten per cent of the 
population is still highest even though its average income may 
be lower or higher than the figure given, and this holds true 
for the remaining groups. The two studies reported here are 
comparable in that both are based on about 5000 interviews 











PUBLIC RELATIONS AND SOCIAL ATTITUDES 591 


in almost the same 70 cities and towns. Differences between 
geographical groups were found to be not nearly so important 
as were differences between economic groups. 


ATTITUDE TOWARD LARGE CORPORATION 


One of the questions of great social interest and import is 
that with respect to large corporations and their place in 
American life. As a basis for subsequent questions in respect 
to large corporations, the study asked for the opinions of people 
in respect to a list of specific companies, namely : 


Ford Motors 

U. 8S. Steel 
General Electric 
General Motors 


Chrysler 


The results given below do not correspond with the order in 
which these companies are named. 


Question: Which of these companies do you think well of generally, and 
which not so well? (name each company and check) 








Total Total 
Feb. 1937 Oct. 1937 
% %o 
Company A 
5 RE We a ee 64 56 
SS era 20 17 
gg Gy ERT 16 27 
NE NS sisgteentnn 100 100 
Company B 
Well 86 85 
BERR TD WINE sccnsssesennsndisnreisnsnnitans a) 7 
I I  sivcccsitiintisiatthiticianisa 5 8 
| SERRE ee 100 100 
Company C , 
Well 78 72 
I ID siprnterinscnsncynldaven 18 20 
gg SERRE Sen 4 8 


ED | GD, tices 100 100 





ee eee 











592 THE PSYCHOLOGICAL CORPORATION 
Company D 
Well 17 72 
yaar 14 13 
Dom ’t KNOW ooecccceeconeesmsnen 9 15 
eel es 100 100 
Company E 
Well 72 78 
BU NS cnicccerdithecscieennitines 22 12 
Doom t Km OW occ eceesesen 6 10 
ee AR 100 100 


The form of question used here is by no means the only form in 
which we have approached this problem in other studies. This 
particular form of question was being experimented with in 
comparison with other forms used. However, in the two 
studies reported above, the form of question was exactly the 
same. It will be seen that some very definite changes in the 
opinions cf particular companies have taken place in seven 
months. The proportion of people who think favorably of 
Company E has increased, whereas the proportion of people 
who think favorably of Companies A, C and D has definitely 
decreased. The facts in connection with these changes, when 
the names of the companies are known, make these changes seem 
quite plausible. 


LARGE CORPORATIONS, GOOD OR BAD? 

The published results of certain studies made in this field, 
and many of the public statements by public characters, have 
indicated the belief that large corporations are a menace rather 
than a benefit in our national life. Following is the expressed 
attitude of the urban public on this question. 


Question: Do you believe that large companies like these do more good 
than harm or more harm than good? 








Total Total 
Feb. 1937 Oct. 1937 

% % 

More good than harm .............. 75 71 
More harm than good ............... 14 12 
Don’t know 11 17 
Total % 100 100 
Total interviews 0.0.0.0... 4402 5000 











PUBLIC RELATIONS AND SOCIAL ATTITUDES 593 


By Economic Groups—Oct. 1937 


A B Cc D 

%o %o % %o 

More good than harm ................ 84 74 69 63 
More harm than good ................ 7 11 13 16 
OT I akin 9 15 18 21 
SERINE SUE esrdacocnchietinetcbinssis 100 100 100 100 


From these results it seems that there is a great preponderance 
of adults who believe that large corporations like those men- 
tioned do more good than harm, and this is true even in the 
lower economic groups. However, it may also be significant 
that this proportion is lower in October than in February. The 
difference is not yet large enough to be statistically important, 
but will certainly be significant if this trend continues in sub- 
sequent studies. 


LARGE VS. SMALL COMPANIES 


There has also been a strong tendency toward deprecating 
large companies at the expense of small companies, and much 
of the pressure in the field of company regulation and the regu- 
lation of labor practices has been directed against the large 
companies. The following question has to do primarily with 
the subject of employees and their treatment. 


Question: In general, do big companies or small companies treat their 
workers better? 


Total Total 











Feb. 1937 Oct. 1937 
% % 
Big COMPANIES. .ecccccocriocrenrenenne 54 51 
Small Companies ccc 34 30 
Don’t know ... 12 19 
Total % 100 100 
By Economic Groups—Oct. 1937 
A B Cc D 
% % % % 
Big COMPAMICS 2. eccccnenernenenen 62 53 48 48 
Small companies nn. 19 28 33 30 
Don’t know 19 19 19 22 


SE 2D, patbnceeninlicmeninne 100 100 100 100 














594 THE PSYCHOLOGICAL CORPORATION 


Here again we find a preponderance of people who believe that 
large companies treat their workers better than do the small 
companies. While the results vary by economic groups, this 
difference is true even of the lower economic groups. The fact 
that a considerable number of these people work in large or 
small companies does not alter the significance of these results. 
However, it would be extremely interesting if we knew which 
of these people work in large companies and which work in 
small companies and then could tabulate their answers in terms 
of these facts. 

Here again there seems to be an increase in the uncertainty 
of people with a less definite attitude than there was in the 
February study. The above questions are by no means exhaus- 
tive and do not go into the details of the specific attitudes which 
make up these more general attitudes. We have confined our- 
selves, in these two studies, rather to an over-all or comprehen- 
sive attitude, although in some of our private studies we go 
into much further detail. 


SIT-DOWN STRIKES 


When the February study was made, the sit-down strike 
technique was having its first great dramatic expression, 
namely, the General Motors strike. 


Question: Do you believe that sit-down strikes are right or wrong? 




















Total Total 
Feb. 1937 Oct. 1937 
% %o 
Right 23 18 
Wrong ....... 62 70 
Don’t know 15 12 
Total % 100 100 
By Economic Groups—Oct. 1937 
A B Cc D 
% % % %o 
Right 7 13 18 29 
Wrong 86 78 69 52 
Don’t know 7 9 13 19 





Total % 100 100 100 100 











PUBLIC RELATIONS AND SOCIAL ATTITUDES 595 


The above results indicate a sharp increase in the proportion 
of people who think that sit-down strikes are wrong, a decrease 
in those who think they are right, and a decrease in those who 
were uncertain. -However, the interesting fact is that even 
at the height of this technique, when many important officials 
and public characters were doubtful or even avowedly in favor 
of the sit-down strike technique, the large majority of the 
people believed that sit-down strikes were wrong. Then, as 
now, this was true even of the large C and D groups constitut- 
ing sixty per cent of the population. 


THE SUPREME COURT 


When the first of these two studies was made, the proposal to 
add six new judges to the Supreme Court was a very live issue. 
It was considered desirable to discover the attitude of the public 
now that this issue has been temporarily disposed of. 


Question: Do you believe that the United States Supreme Court should 
have six new judges? 























Total Total 
Feb. 1937 Oct. 1937 
%o % 
ss ee ree 34 25 
No 47 51 
Don’t know 19 24 
Total % 100 100 
By Economic Groups—Oct. 1937 
A B Cc D 
% % % %o 
Yes 15 21 28 31 
No 74 63 47 31 
Don’t know 11 16 25 38 
Total % 100 100 100 100 





It will be seen that the above question is asked in an extremely 
matter of fact way without any element of suggestion, and-with 
no reference to the age of the judges. Even in February, a 
larger proportion of the people were against this proposal than 
for it, and in the October study these proportions are two to 





ae 





596 THE PSYCHOLOGICAL CORPORATION 


one. As is usually the case when public opinion is in the 
process of transition, there is first a significant increase in the 
percentage of people who answer ‘‘don’t know’’ and who in 
later studies swing more definitely into one of the positive 
categories. 


HUGO BLACK 


In view of the great interest in the appointment of Hugo 
Black, and the controversy which has waged around his ap- 
pointment, the following question has particular interest. 


Question: Do you believe that Hugo Black is a proper man to have as a 
Supreme Court Judge? 
Economic Groups 





Total A B Cc D 

% % % %o %o 

SOD. schinmiinpeniioiiign 31 22 31 34 28 
No 41 63 45 36 36 
Don’t know ............ 28 15 24 30 36 
Total % ........... 100 100 100 100 100 


The attitude toward the Hugo Black appointment does not 
show the great moral indignation some might expect. Never- 
theless, the number against his appointment is definitely higher 
than the number who approve. In this case we show results 
by geographical sections, because of their special significance. 


East. Mid South. West. 
Total States West States States 





%o %o %o %o % 

RID 7 secicnigiicbinmsntintiontes 31 25 31 38 38 

No 41 48 43 23 38 

Don’t know ............ 28 27 26 39 24 

Total % ........... 100 100 100 100 100 
RELIGION 


The questions in respect to religion, especially the second 
question, may be regarded as leading. However, even leading 
questions have their value in studies of this kind, especially 
when they are repeated so as to give us a measure of trends. 





PUBLIC RELATIONS AND SOCIAL ATTITUDES 597 


Question: Do you think that religion is losing or gaining influence in the 
United States? 


Total Total 
Feb. 1937 Oct. 1937 


%o %o 





Losing 
Gaining 
Neither 
Don’t know 

Total % 














44 44 
34 28 
13 13 

9 15 


100 100 


Question: Do you think that religion should have more influence in the 
United States? 
Total Total 

Feb. 1937 Oct. 1937 

%o % 

76 76 

16 14 
Don’t know 8 11 


Total % 100 100 














The results above indicate, if anything, a declining faith in the 
progress of religion but a steady belief in the desirability of its 


greater influence. 
larly significant. 


The results by economic groups are particu- 


Question; Do you think that religion is losing or gaining influence in the 
United States? 
Economic Groups 
B Cc D 
% % % % 
39 44 45 46 
30 30 27 27 
17 14 13 9 
15 14 12 15 18 
100 100 100 100 


Question: Do you think that religion should have more influence in 
United States? 
Economic Groups 

B Cc 

%o % 
76 76 

16 13 

8 11 
100 100 











598 THE PSYCHOLOGICAL CORPORATION 


These results indicate that the intellectual and well-to-do 
groups from whom leadership might be expected, are more 
skeptical about religion than are the lower economic groups. 
It may also be said that the greater belief of the lower economic 
groups represents a desire for a refuge from life, or a desire for 
help, comparable to the attitude shown by these groups in 
respect to practically all economic issues representing govern- 
mental paternalism. The many studies of the Institute of 
Public Opinion, when analyzed, show that on practically every 
issue involving governmental paternalism, the lower economic 
groups, the young people who have not yet achieved economic 
independence, and the older people many of whom have lost 
their economic independence, are more often in favor of gov- 
ernment assistance than are other groups. 


GOVERNMENT AND BUSINESS 


The question of the relationship between government and 
business represents, today even more than before, one of the 
critical issues in the country. In this case we present, in addi- 
tion to the results of the February and October studies based 
on 4402 and 5000 interviews, the results in May, 1936, and May, 
1937, based on 10,000 personal interviews each. 


Question: Do you believe that the present government is helping or hurt- 




















ing business? 

May Feb. May Oct. 
1936 1937 1937 1937 

% % % % 

Helping 51 70 58 58 
Hurting 24 18 18 25 
Don’t know 25 12 24 17 
Total % 100 100 100 100 

By Economic Groups—Oct. 1937 

A B Cc D 

% % % % 

Helping 39 54 62 65 
Hurting 46 31 21 16 
Don’t know 15 15 17 19 





Total % 100 100 100 100 








PUBLIC RELATIONS AND SOCIAL ATTITUDES 599 


Since the high point reached in February, 1937, when 70 per 
cent said helping and 18 percent said hurting, there has been a 
considerable change in the reverse direction. In the October 
study the percentage of those who answered hurting, reaches 
its highest point so far. We see here the transition from the 
large percentage who in May, 1937, say don’t know, into the 
percentage who in October say hurting. In respect to the eco- 
nomic groups, we see again the effects of the attitude toward 
paternalism, namely the larger proportion of people as we go 
down the economic scale who believe that government is helping 
rather than hurting business. This issue promises to become 
one of the most fundamental issues in respect to the future 
developments in our national economy and political system. 


POPULARITY OF POLITICAL AND BUSINESS LEADERS 

The Fortune magazine survey report in the October issue 
used a method to determine the relative popularity and un- 
popularity of industrial leaders. The result gave a decidedly 


one-sided picture because it was based on the method of Un- 
aided Recall. We decided to try a method which would pro- 
duce a different type of result, also one-sided, but in a different 
direction. The method used was that of Aided Recall and the 
question was as follows: 


Question: Here is a little popularity contest : Which of the fol- 
lowing men do you think well of, which not well off (Read 
each name, and check that one. If don’t know or neither, enter 
D.K.) 


After this question the name of each individual was read off 
to the person being interviewed. The names included, Henry 
Ford, Walter Chrysler, Owen D. Young, Alfred P. Sloan, Jr., 
Myron C. Taylor, John L. Lewis, William Green, J. Edgar 
Hoover, Franklin D. Roosevelt, and several others. The results 
in respect to these names will not be published in view of the 
method used. Nevertheless, they represent an interesting con- 
tribution toward a technique by which changes in the popular- 
ity of these individuals can be measured. 








600 THE PSYCHOLOGICAL CORPORATION 


COMMUNISM AND FASCISM 


In asking the questions in respect to communism and fascism, 
we did not expect to receive answers which could be taken as 
final. Neither communism nor fascism are definitely under- 
stood by many people. Nevertheless, this does not prevent 
people from having some kind of reaction to the very words 
themselves, partly rational, maybe largely emotional. More- 
over, again the purpose is to establish a base for measuring 
trends or changes in public opinion. 


Question: Do you believe that the United States is on the way to com- 


























munism ? 
Total Total 
Feb. 1937 Oct. 1937 
% % 
Yes 20 14 
No 64 64 
Don’t know 16 22 
Total % 100 100 
By Economic Groups—Oct. 1937 
A B Cc D 
% % % % 
Yes 10 12 14 19 
No 77 73 46 
Don’t know 13 15 22 35 
Total % 100 100 100 100 


The proportion who answered yes in February was very high, 
and may have been due to the fact that the public tends to 
identify all forms of violence and lawless activity with com- 
munism. It will be remembered that such activities were far 
more prevalent in February than in October. 


Question: Do you believe that the United States is on the way to Fascism? 








Total A B Cc D 

% % % % % 

Yes 9 8 8 8 11 
No 66 71 75 66 50 
Don’t know .............. 25 21 17 26 39 
Total J ccc. 100 100 100 100 100 














PUBLIC RELATIONS AND SOCIAL ATTITUDES 601 


Apparently a considerably smaller proportion of the public 
either knows about or is in fear of fascism than knows about 
or is in fear of communism. 


ATTITUDES OF FARMERS 


In the October study, a number of interviews were also made 
with farmers. However, the sample was too small, even though 
widely distributed, to produce a statistically reliable result. 
However, bearing this qualification in mind, there were some 
questions in respect to which the results of the farm interviews 
were very much like those in the urban interviews. A few of 
the significant differences, however, were: 


A definitely greater fear of communism. 
A definitely larger proportion who believe that the govern- 
ment is hurting rather than helping business. 


In general, the proportion of farmers answering don’t know 
to various questions was larger than among the urban popula- 
tion, indicating probably a lesser interest or a lesser knowledge 
about some of the issues raised by these questions. 


CONCLUSIONS 


The above results in the studies reported were not intended 
to produce conclusions other than those applying to methodol- 
ogy or techniques of studying public opinion. Probably the 
most important point about such studies is the possibility they 
reveal in respect to measuring changes in public attitude both 
by totals and by economic groups. The wording of the ques- 
tions used is, of course, very important and the different results 
which can be obtained by slight changes in the wording of 
questions have already been reported in this journal. Although 
the Psychological Corporation has been making such studies 
since 1932, beginning with its studies of public opinion toward 
the N.R.A. and relief policies, its purpose has been to develop 
techniques and to show possibilities rather than to conduct a 
periodic barometer of. public opinion similar to that of the 











602 THE PSYCHOLOGICAL CORPORATION 


Institute of Public Opinion conducted by Dr. George Gallup. 
Psychologists interested in the analysis of social trends over 
a period of time, by economic groups, by sex, and by age, 
will find in the reports of this Institute a wealth of valuable 
information. 

NOvEMBER 11, 1937. 











NEWS AND NOTES 


An American Association of Applied Psychologists was organized at 
Minneapolis August 30 and 31, 1937. It succeeds the Association of 
Consulting Fsychologists and will continue the Journal of that society 
but consists of four sections covering all the chief fields of application 
of psychology. About four hundred psychologists were in attendance at 
the organization meetings. Dr. Douglas Fryer, of New York University, 
was chosen President, Dr. Horace B. English, of Ohio State University, 
Executive Secretary, and Dr. Edward B. Greene, University of Michigan, 
Treasurer. Section Chairmen chosen were: for Clinical Psychology, Dr. 
Andrew Brown, of the Institute of Juvenile Research of Illinois; for 
Consulting Psychology, Dr. Richard Paynter, Long Island University; 
for Educational Psychology, Dr. P. M. Symonds, of Columbia University ; 
for Industrial and Business Psychology, Dr. H. E. Burtt, Ohio State 
University. 

Membership is by election of the Executive Council upon application of 
the candidate and recommendation of two sponsors. The minimum formal 
requirements are as follows: 

(F) Fellow, with Ph.D. or equivalent degree or certificate of training 
in psychology or applied psychology, and either, (a) Experience: four 
years practice in the application of psychology as a science, largely under 
own guidance, or (b) Research: systematic published research of signifi- 
cant value in the applications of psychology beyond the doctoral disserta- 
tion. (For the present an equivalent for the doctorate may be accepted 
at the discretion of the Council, defined as two or more years of directive 
or supervisory practice in addition to the four years required under (a) 
above.) 

(A) Associate, with Ph.D. or equivalent degree or certificate of train- 
ing in psychology or applied psychology and one (1) year practice in the 
application of psychology under the direction of a psychologist with 
qualifications of a Fellow. (For the present an equivalent for the doc- 
torate may be accepted at the discretion of the Council defined as two or 
more years of practice largely under own guidance in the application of 
psychology in addition to one year required above.) 

Both Fellows and Associates will vote on the business of the American 
Association; only Fellows may hold office in the Council. 

Applications for membership blanks may be obtained from Dr. H. B. 
English, Executive Secretary of the Association, by writing him at the 
Department of Psychology, Ohio State University, Columbus, Ohio. 


603 








PA NE 8 AE 


604 NEWS AND NOTES 


Preliminary announcement of the annual meeting of the American 
Association for the Advancement of Science, to be held in Indianapolis 
from December 27 to 29, has been received. In addition to the usual 
program of contributed papers of Section I (Psychology) there will be 
on Tuesday, December 28, a symposium on ‘‘The Endocrines as Related 
to Behavior,’’ under the chairmanship of Dr. Calvin P. Stone, Stanford 
University. The Secretary of Section I, Dr. Leonard Carmichael, Uni- 
versity of Rochester, Rochester, N. Y., will be happy to receive and endorse 
the applications of Members and Associates of the A. P. A., and to answer 
questions concerning the work of Section I. 


The Fourth Institute on the Exceptional Child was held under the 
auspices of the Child Research Clinic of the Woods School at Langhorne, 
Pennsylvania, on Tuesday, October 26. The subject of the Conference 
was ‘‘ New Contributions of Science to the Exceptional Child.’’ Among 
the speakers were Dr. Esther Lloyd-Jones, Teachers College, Columbia 
University; Dr. Paul Schilder, New York University; Dr. Leo Kanner, 
Johns Hopkins Hospital; Dr. Louis A. Lurie, University of Cincinnati, 
and Dr. Fritz B. Talbot, Boston, Mass. 

The Second National Conference on Educational Broadcasting will be 
held at the Drake Hotel, Chicago, Illinois, November 29, 30, and December 
1, 1937. Among the objectives, formulated by a committee, are the 
following: 1. To provide a national forum where interests concerned with 
education by radio can come together to exchange ideas and experiences; 
2. To examine and appraise the situation in American broadcasting as a 
background for the consideration of its present and future public service; 
3. To examine and appraise the interest of organized education in broad- 
casting; 4. To bring to a large and influential audience the findings that 
may become available from studies and researches in the general field of 
educational broadcasting, particularly such studies and researches as may 
be conducted by the Federal Radio Education Committee. Dr. Lyman 
Bryson, of Teachers College, will serve as leader of all the discussions 
which follow the general sessions. Dr. George F. Zook, President of the 
American Council on Education, will act as Conference Chairman. Further 
information concerning the Conference may be obtained from Dr. C. 8. 
Marsh, the Executive Secretary, 744 Jackson Place, Washington, D. C. 


During the first two weeks of November an opportunity will be given 
to those interested to inspect a number of photographs of the experi- 
mental classes for gifted children, conducted jointly by the Board of 
Education of New York City and Teachers College at P.S. 500, Speyer 
School. These children, testing between 130-200 [Q, are representative 
of the most intelligent one per cent of the rising generation. The cur- 
riculum established for ordinary children cannot challenge them nor 











NEWS AND NOTES 605 


extend their mental powers. Accordingly, the Speyer School was estab- 
lished as a laboratory seeking to develop scientific methods of extending 
their educational opportunities. It is hoped that the exhibition will in 
some measure inform the public of the appearance and personalities of 
the exceptional human in childhood and give a visual impression of what 
society is now doing to utilize his intelligence. The photographs will be 
on view at the library of Teachers College, Columbia University, from 
October 1 through November 13. The photographs are the work of 
Edward Anhalt, Educational Film Director, collaborating with Professor 
Leta 8. Hollingworth, Teachers College. 


First results in the first experiments in telepathy ever conducted on a 
nation-wide scale were announced recently during the broadcast of the 
Zenith Foundation program over the NBC-Blue Network. Although 
Commander E. F. McDonald, Jr., sponsor of the program, and several 
university scientists who supervised the experiment are unwilling to draw 
a conclusion from the results of the first test in the series, the answers 
mailed in by thousands of listeners and tabulated are approximately one- 
third above the mathematical laws of chance in point of correctness. 
Listeners were asked to determine whether a selecting machine brought 
into view a black space or a white space. Ten ‘‘senders’’ of proven 
telepathic ability were in the NBC Chicago Studio to attempt to transmit 
the correct answers to the listeners by telepathy. The machine was 
operated seven times, but the third and seventh trials were blanks. 
Approximately 20 per cent of those who replied called four of the five 
trials correctly, and 4 per cent called all five correctly. Other tests in 
‘the series will be announced as given. 





BOOK REVIEWS 


ALFRED ADLER, FRANZ ALEXANDER, TRIGANT BuRRoW, ELTon Mayo, Pau. 
ScHILDER, Davip SLIGHT, Harry Stack SULLIVAN, Epwarp Sapir AND 
HERBERT BLUMER. ‘‘The Contribution of Psychiatry to the Under- 
standing of Human Society.’’ Amer. J. Sociol., 1937, 47, 773-877. 

To review, in a small space, a symposium in which each contributor 
has been brief and to the point, seems to me an impossible task. Seven 
noted psychiatrists have presented essays on the topic stated, and Drs. 
Sapir and Blumer have commented upon them. Persons who want only 
an abbreviated form of the symposium should therefore read the two 
final articles. 

It is perhaps unfortunate that the seven psychiatrists could not have 
met and discussed their positions. Much repetition could have been 
avoided. A certain number of points could have been put down, as the 
lawyers say, by stipulation, as for example the view that personal dis- 
organization may have as its projection social disorganization, that child- 
hood experiences are mostly to blame for personal disorganization, etc. 

We might then have ventured to ask for more specificity in the treat- 
ment of various problems. There is notable something of a tendency to 
deal in glittering generalizations. I think a worthy exception to this is 
the paper by Dr. Alexander, who discusses two important errors in this 
borderline field of social psychiatry or psychiatric sociology. One he 
calls ‘‘psychoanalyzing society,’’ which amounts to a tacit acceptance 
of the group-mind fallacy. Thus some enthusiasts speak of war as 
expressing a nation’s sadism and need for punishment. The question 
may well be asked: whose sadism and whose need for punishment? The 
persons who start a war are not those who fight the war! 

The second error to which Alexander refers is that of misinterpreting 
such sociological concepts as ‘‘culture pattern’’ in dealing with the indi- 
vidual case. We must remember that the person grows in an environ- 
ment of other individuals, and that ‘‘culture pattern’’ is a statistical 
concept relating to the mode of action of a large number of people. 
Hence the growth of an individual in a given culture area does not auto- 
matically prove that he was subjected to these culture patterns. We 
must know the behavior of the specific persons who constituted his psy- 
chological milieu. (On the other hand, I think it is important to know 
whether the ‘‘environmental’’ personalities accept or deviate from the 
eulture pattern.) 


606 





BOOK REVIEWS 607 


The bibliography accompanying the symposium is overloaded with 
traditional psychiatric references. I should be inclined to substitute 
a few recent books such as Freeman’s Social Psychology, in which the 
task of relating the individual to his social milieu through the medium 
of values is very efficiently handled; or Brown’s Psychology and the 
Social Order, in which the psychological problems of group behavior are 
treated in a novel and stimulating way; or Sherif’s Psychology of Social 
Norms, which approaches in still another way this difficult problem of the 
individual and his culture. 

It seems to me that all of these moot problems go back to a question 
which is hinted at by Dr. Burrow’s paper, and treated, though incom- 
pletely, by Dr. Sullivan’s, namely, the specific psychological processes by 
which the social milieu becomes a part of the individual personality, and 
conversely, by which the individual becomes an effective part of his social 
system. These are matters of concern to the specialists in child psychol- 
ogy and learning, for it is generally agreed that childhood determines 
these reactions. 

We must remember that culture patterns and social systems do not 
exist within individuals, but that habits and values can. Further we must 
recognize that social disorganization may be in the form of a conflict 
between individuals who are within themselves well integrated. I take 
it, for example, that the present warfare of labor and capital constitutes 
a phase of social disorganization. But, for any particular person, this 
economic conflict may be a cause of his individual problems, or his par- 
ticipation may be a result of neurotic trends, or he may show no personal 
disorganization at all. 

These reflections suggest that the psychology of social disorganization 
is a much larger problem than the psychiatry of social disorganization ; 
and that an important theoretical need in social psychology is for a sys- 
tematization of our knowledge of the relationships between personality 
and the social system. This is not to be interpreted as a deprecation of 
the present symposium, but merely a recognition of a limitation upon its 
scope. Within that scope—large enough, in all conscience—it presents 
numerous ideas which psychiatrists, sociologists and psychologists can 
study with much profit. 

Ross STAGNER, 
University of Akron. 


Fow er, D. Brooxs. Child Psychology. Boston, Houghton Mifflin Co., 
1937.) Pp. xxx +600. ; 
Another book on child psychology would ordinarily require little more 
than passing mention. However, this book of Dr. Brooks, written with 
the collaboration of Laurance F. Shaffer, stands out as a compact, inte- 





pe EL LT” EE SIT EE tk 


608 BOOK REVIEWS 


grated and lucidly written account of the psychological aspects of child- 
hood from birth to twelve years of age. 

The first chapter introduces the methodology of child psychology, and 
is followed by two chapters discussing the biological and learning bases 
of development. In following chapters the behavior of the neonate, 
growth of the physical organism, motor skills, language, mental functions, 
and intelligence, are carefully considered. Nine chapters are devoted to 
problems of emotion, personality, social adjustment and guidance of chil- 
dren. An excellent chapter, not ordinarily found in psychological text- 
books, diseusses the principles and needs of physical and mental hygiene 
of childhood. 

This is not a textbook written to support some personal inclination 
nor any established theoretical position. On the other hand it is not a 
mere compilation. Dr. Brooks has integrated a very great amount of 
modern literature into an excellent handbook. Nearly every page has 
footnote references to the literature. It is unfortunate that the author 
did not feel his references were important enough to make them complete. 
Dates and titles of journal articles are never given, and usually only 
titles of books are cited. These lacks make the bibliographical footnotes 
almost useless to the reader. In spite of the inadequacy of the references 
one can do nothing else but recommend this book wholeheartedly to all 
those interested in the problem of childhood. 

C. M. Lourtir, 
Indiana University. 


JOHN Epwarp BENTLEY. Superior Children. New York, W. W. Norton 
& Co., 1937. Pp. xxiii+331. $3.00. 

One of the paradoxes of our social culture is the short sightedness in 
training those who will be the inevitable leaders of the future—our 
superior boys and girls. Probably not more than one tenth as much is 
spent in special educational facilities for the superior child as is being 
spent for the training of subnormal and feebleminded children. This 
paradox has, of course, been long recognized by prominent educational 
leaders, but the rank and file have paid the needs of the gifted child only 
lip service. Dr. Bentley has written a book about the superior child that 
is a worthy addition to the growing literature on this group. 

The fundamental interest of the book is in the education of superior 
children, but there are chapters presenting a summary of the physical 
and behavioral characteristics of the group, as well as a description of 
tests and methods for discovering the gifted. Five chapters are par- 
ticularly devoted to education and these deseribe a number of special plans 
at present in operation, the qualities of the teacher of superior children 
and in an appendix several detailed programs are presented. There is a 








BOOK REVIEWS 609 


chapter on the gifted girl including comments on a number of famous 
women. In another chapter there are brief biographies of several famous 
men suitable for class use. 

The weakest chapter in the book is the fourth in which is discussed the 
question of inheritance of superiority. The author does not offer a very 
strong brief for inheritance, although he is apparently convinced of its 
primary importance. In general the chapter organization and inequitable 
distribution of space devoted to various topics might be severely criticised. 
While the book cannot be considered an exhaustive study of the superior 
child from a psychological or even educational point of view, it will have 
a definite usefulness for certain groups. This reviewer would suggest 
that it be made required reading for all school superintendents and more 
particularly for all school board members. 

C. M. Lourtit, 
Indiana University. 


Oscak K. Buros. Educational, Psychological, and Personality Tests of 
1933, 1984, and 1935. Rutgers University Bulletin, Vol. XIII, No. 1. 
(Stud. Educ. No. 9). 1936. 83 p. 

Educational, Psychological, and Personality Tests of 1936. Rutgers 

University Bulletin, Vol. XIV, No. 2A. (Stud. Educ. No. 11). 1937. 
141 p. 

Earu B. Soutn. An Index of Periodical Literature on Testing. New 
York, Psychological Corp., 1937. xii +286 p. 

In 1933 Dr. Hildreth published her extensive bibliography of psycho- 
logical and educational measuring instruments containing in the neigh- 
borhood of 3,000 entries. This monumental work was a spectacular 
indication of the extensive activity during a third of a century in the 
devising of tests. One might even have ventured the opinion that probably 
less than a third of the total number would have served all the real needs 
of psychological and educational measurements. That this opinion has 
not been universally held is evident from the bibliographies of Dr. Buros. 

As a supplement to Dr. Hildreth’s bibliography, Buros published in 
1935 a list of tests that had been published during 1933 and 1934. The 
following year this list was enlarged to include tests of 1935. In the first 
volume listed above, there are 503 testing instruments listed, classified by 
subject, and supplied with alphabetical author and title indexes. The 
1937 volume continues the listing with an addition of 365 more tests or a 
total of 868 in all. The author and title indexes include only entries in 
the second volume. This volume also introduces an innovation in the 
bibliographical tools of psychology in the form of a book review digest 
of books on measurements. Following the style of the familiar Book 
Review Digest, 291 measurement books published during 1933 to 1936 








610 BOOK REVIEWS 


inclusive are listed with quotations from reviews which have appeared in 
the psychological and educational journals. 
Dr. South’s volume is an invaluable addition to those of Drs. Hildreth 
and Buros, as well as to the literature of psychological measurements. 
This is a carefully selected list of 5,005 papers on testing which appeared 
in the periodical literature between 1921 and 1936. The bibliography 
includes not only papers describing or evaluating tests, but also those on 
methods, studies made with tests in educational, clinical, industrial, etc., 
situations. A list of subjects from the index gives a suggestion as to the 
wealth of material included: ability, achievement, various school subjects, 
aptitudes, attitudes, behavior, character clinics, college, delinquency, 
feebleminded, growth, guidance, homogeneous grouping, intelligence, 
music, quotients, race, sex, statistics, vocations. The subject index from 
which this list is taken is an excellent cross-reference index of all subjects 
in the various titles included in the author list. 
In a field such as psychology, where the literature appears in many 
journals, often unexpected ones, bibliographies are always of value. How- 
ever, the compilation of a good bibliography is laborious and it is seldom 
given adequate recognition by members of the profession. The bibliogra- 
phies of Drs. Buros and South are noteworthy and every psychologist and 
educator interested in measurements is indebted and should be grateful 
to them. The publication of bibliographies is expensive and the sale is 
small, therefore Rutgers University and the Psychological Corporation are 
to be complimented in supporting these ventures. 
C. M. Lourtit, 
Indiana University. 








NEW BOOKS AND PAMPHLETS RECEIVED 


Books and pamphlets for review should be sent to James P. Porter, 
Editor, JOURNAL OF APPLIED PsycHOLOGY, Ohio University, Athens, Ohio. 


Addresses on Industrial Relations, 1937. Bureau of Industrial Relations, 
Bulletin No. 6, University of Michigan, Ann Arbor, Michigan, 1937. 
46 pp. 

Child Psychology. Nort B. Curr. The Standard Printing Company, 
Louisville, Ky., 1937. 299 pp. 

The Definition of Psychology. Frep 8. Keiizr. D. Appleton-Century 
Company, New York, 1937. 116 pp. 

Directing Study Activities in Secondary Schools. W.G. Brink. Double- 
day, Doran & Company, Garden City, N. Y., 1937. $3.00. 738 pp. 

Educational, Psychological and Personality Tests for 1936. Oscar K. 
Buros. Rutgers University, New Brunswick, N. J., 1937. 141 pp. 

The First Fifty Years: An Administration Report. WauLace W. 
Atwoop. Clark University, Worcester, Mass., 1937. 120 pp. 

General and Social Psychology. Revised Edition. Rosert H. THOULEsS. 
University of Tutorial Press, St. Giles, High Street, London, England. 
8s. 6d. 522 pp. 

Getting Along in College. Lowry 8S. Howarp AND HERBERT POPENOE. 
Stanford University Press, Stanford University, Calif., 1937. 58 pp. 

An Index of Periodical Literature on Testing. Ear Bennetr Sours. 
Psychological Corporation, New York, 1937. 286 pp. 

Influencing the Buyer’s Mind. CHuarRLes BENNETT. American Efficiency 
Bureau, Hill Building, St. Louis, Mo. 136 pp. 

Logic—Theoretical and Applied. D. LuTHer Evans AnD WALTER 8. 
GAMERTSFELDER. Doubleday, Doran & Company, Garden City, N. Y., 
1937. $2.50. 482 pp. 

Motion Pictures in Education. Epwarp CALE, FANNIE W. DuNN, CHARLES 
F. Hospan, Jz., AND Etta SCHNEIDER. H. W. Wilson Company, New 
York, 1937. $2.50. 475 pp. 

Primitive Intelligence and Environment. 8. D. Portreus. Maemillan 
Company, New York, 1937. $3.00. 325 pp. : 

Psychology and Life. Fuovp L. Rucw. Scott, Foresman & Company, 
New York, 1937. 679 pp. 

Psychology of Personality. Ross Stacrer. MeGraw-Hill Book Com- 
pany, New York, 1937. $3.50. 465 pp. 


611 








612 NEW BOOKS AND PAMPHLETS 


Pupil Rating of Secondary School Teachers. Roy C. Bryan. Bureau of 
Publications, Teachers College, Columbia University, New York, 1937. 
$1.60. 96 pp. 

The Questioning Mind. Rupert CLENDON Lopez. E. P. Dutton & Com- 
pany, New York. $2.75. 312 pp. 

Research Memorandum on the Family in the Depression. Samuet A. 
STOUFFER AND PavuL F. LAZARSFELD. Social Science Research Coun- 
cil, 230 Park Avenue, New York, 1937. 221 pp. 

The Science of Seeing. MattrHew LUCKIESH AND FRANK K. Moss. D. 
Van Nostrand Company, New York, 1937. $6.00. 547 pp. 

Scientific Salesmanship. CHARLES BENNETT. American Efficiency Bureau, 
St. Louis, Mo., 1937. 128 pp. 

Statistics in Psychology and Education. Second Edition. Henry E. 
GaRRETT. Longmans, Green & Company, New York, 1937. $3.50. 
493 pp. 

Studies in Experimental Phonetics. Archives of Speech. JosePnH TIFFIN, 
Editor. Department of Speech, State University of Iowa, Iowa City, 
Towa. $1.00. 60 pp. 

Try Living. Wrm11am Movtton Marston. Thomas Y. Crowell Com- 
pany, New York, 1937. $1.75. 228 pp. 

Variability in Results from New-Type Achievement Tests. Eart V. Put- 
u1as. Duke University Press, Durham, N. C., 1937. $1.00. 100 pp. 






































